Created August 19, 2024

Smart Object Detect

Using Vision AI technology and TTS technology to help visually impaired people perceive target object.

Things used in this project

Hardware components

Seeed Studio grove vision ai 2

Seeed Studio ov5647 camera

Seeed Studio XIAO ESP32S3 Sense

DFRobot Gravity: Speech Synthesis Module(Support English and Chinese)

Software apps and online services

Seeed Studio Fork Detection

Story

1.Project Planning

Scene perception in indoor activities is a difficult problem for visually impaired people. If technology can provide users with certain hints and target identification, it will be of great help.
After analysis, we propose a solution: using computer vision to identify discrete objects in indoor scenes and prompt users with voice prompts. Users can complete their perception of the indoor scene and locate the target object through the prompts. For example, if a knife and fork are found in an indoor scene, it may be in a restaurant or kitchen. Users can understand the general location of the target object through coordinates and take further action.
The specific implementation includes three modules: the first is the target recognition module, which identifies and classifies the targets in the scene through the camera; the second is the analysis module, which analyzes the results of target recognition and generates a report; The second is the voice module, which will provide the user with the recognition results through voice prompts.

2.Part Sourcing

1）Grove Vision AI 2

It is an MCU-based vision AI module powered by Arm Cortex-M55 & Ethos-U55. It supports TensorFlow and PyTorch frameworks and is compatible with Arduino IDE. With the SenseCraft AI algorithm platform, trained ML models can be deployed to the sensor without the need for coding. It features a standard CSI interface, an onboard digital microphone and an SD card slot, making it highly suitable for various embedded AI vision projects.

Grave Vision2 Base Board

2) OV5647 Fov Camera

This camera module benefits from Fisheye Lens to achieve 62 Field of View, utilizing the OV5647 sensor to reach 2592 x 1944 active array size image show, supporting Raspberry Pi 3B+4B.

Grave Vision 2 with Camera and Xiao Esp32 module

3）Xiao ESP32 module

Seeed Studio XIAO Series are diminutive development boards, sharing a similar hardware structure, where the size is literally thumb-sized. The code name "XIAO" here represents its half feature "Tiny", and the other half will be "Puissant". Seeed Studio XIAO ESP32S3 Sense integrates camera sensor, digital microphone and SD card supporting. Combining embedded ML computing power and photography capability, this development board can be your great tool to get started with intelligent voice and vision AI.

Xiao esp32 module

4）Gravity Text-to-Speech module

The Text-to-Speech module enables easy integration of voice functionality into various projects. It supports both Chinese and English languages, providing clear and natural pronunciation. The module can also broadcast the current time and environmental data, and when combined with a speech recognition module, it enables conversational interactions with your projects. The module uses I2C and UARTcommunication modes, with a gravity interface, ensuring compatibility with most main controllers on the market, such as Arduino, micro:bit, Firebeetle Series, LattePanda, and Raspberry Pi. Additionally, it includes a built-in speaker, eliminating the need for an external one. The primary applications for this module include robot voice, voice broadcast, voice prompt, and text reading.

Gravity Text-to-speech module

3.Building

1）step1: Select Model and Deploy

Luckly, we can find many models trained in SenseCraft AI Model Assistant.

SenseCraft website

Fork Detection model, based on the Yolo-Word algorithm, is designed for the Seeed Studio Grove Vision AI (V2) device to accurately detect and recognize forks. This AI model utilizes the advanced Swift yolo algorithm, focusing on fork recognition, and can accurately detect and tag forks in real-time video streams. It is particularly suited for the Seeed Studio Grove Vision AI (V2) device, offering high compatibility and stability. The deployment process is extremely simple, requiring only a few straightforward steps without the need for complex configurations and debugging. The model is suitable for multiple application scenarios, including dining management, retail, kitchen organization, and hospitality. For instance, in dining management, the model can monitor fork inventory and cleanliness, ensuring proper table setup and guest satisfaction; in retail, it can optimize fork display and stock levels. Evaluation metrics: mAP50: 0.53981 mAP50-95: 0.33936.

Fork Detection model

Connecting Vision AI 2 board and download this model.

2) step2:Connect Analysis module

Connect the Grove Vision AI (WE2) module to the default I2C interface of your Arduino board(Xiao Esp32 module) using the 4-Pin Cable. Make sure each wire is connected to the correct pin.

SCL -> SCL (Grove Vision AI WE2)
SDA -> SDA (Grove Vision AI WE2)
VCC -> VCC (Grove Vision AI WE2, 3.3V)
GND -> GND (Grove Vision AI WE2)

Xiao esp32 module will receive Grove Vision 2 board message through I2C protocol， then create analysis report and send it to TTS module realtime.

3) step3:Connect TTS module

Connect Xiao Esp32 module with Gravity Text-to-speech module with UART or I2C (see figure below, pay attention there is a switch control mode UART or I2C, be sure that mode is correct with connection)，then the analysis result can be heard through speech voice.

4.Coding and Algorithm

1）Receive Grove Vision2 message， analysis and report.

First, install depended library:

Seeed_Arduino_SSCMA v1.0.0
ArduinoJson v7.1.0
DFRobot_SpeechSynthesis v1.0.1

As we only focus on high score object，so we set score threshold 0.6，this param can be adjust in pratice. The result will report target and score and center point x,y of object. If report continue，seems too noisy, so give delay 3 seconds to reduce report frequence.

#include <Arduino.h>
#include <Seeed_Arduino_SSCMA.h>
#include "DFRobot_SpeechSynthesis.h"
#define THRESHOLD 0.6
DFRobot_SpeechSynthesis_I2C ss;
SSCMA AI;
void setup()
{
    //initial and setup Grove Vision 2
    AI.begin();
    //initial and setup Speech Synthesis module
    ss.begin();
    //set speech volume.
    ss.setVolume(5);
    //set speech speed.
    ss.setSpeed(5);
    //set speech type female.
    ss.setSoundType(ss.eFemale1);
    //set speech tone.
    ss.setTone(5);
    //set speech in english word mode.
    ss.setLanguage(ss.eEnglishl);
    //initial serial for debug and monitor
    Serial.begin(9600);
    //message application started
    Serial.print("Object Detect App start!");
}
void loop()
{
    String analysis = "";
    // invoke once, no filter , get image
    if (!AI.invoke(1, false, true))
    {
        analysis += String("Detected ");
        for (int i = 0; i < AI.boxes().size(); i++)
        {
            if(AI.boxes()[i].score>THRESHOLD)
            {
                analysis += String(i);
                analysis += String(" target is ");
                analysis += String(AI.boxes()[i].target);
                analysis += String(" score is ");
                analysis += String(AI.boxes()[i].score);
                analysis += String(" position is ");
                analysis += String(AI.boxes()[i].x+AI.boxes()[i].w/2);
                analysis += String(" ");
                analysis += String(AI.boxes()[i].y+AI.boxes()[i].h/2);
            }
        }
        ss.speak(analysis.c_str());
    }
    //interval 3 seconds.
    delay(3000);
}

#include <Arduino.h>
#include <Seeed_Arduino_SSCMA.h>
#include "DFRobot_SpeechSynthesis.h"


//define the forward direction area between 80-140
#define LEFT 80
#define RIGHT 140 

DFRobot_SpeechSynthesis_I2C ss;
SSCMA AI;

void setup()
{
  //initial and setup Grove Vision 2
  AI.begin();

  //initial and setup Speech Synthesis module
  ss.begin(); 
  //set speech volume.
  ss.setVolume(5);
  //set speech speed. 
  ss.setSpeed(5);
  //set speech type female.
  ss.setSoundType(ss.eFemale1);
  //set speech tone.
  ss.setTone(5);
  //set speech in english word mode.
  ss.setLanguage(ss.eEnglishl);

  //initial serial for debug and monitor
  Serial.begin(9600);

  //message application started
  Serial.print("Traffic Sensor App start!");

}

void loop()
{

  String analysis = "";
  // invoke once, no filter , get image
  if (!AI.invoke(1, false, true))
  {
    analysis += String("Detected ");
    for (int i = 0; i < AI.boxes().size(); i++)
    {
      if(AI.boxes()[i].x > LEFT && AI.boxes()[i].x < RIGHT)
      {
        analysis += String(i);
        analysis += String(" target is ");
        analysis += String(AI.boxes()[i].target);
        analysis += String(" score is ");
        analysis += String(AI.boxes()[i].score);
      } 
    }
    ss.speak(analysis.c_str());
    
  }

  //interval 3 seconds.
  delay(3000);
}

#include <Arduino.h>
#include <Seeed_Arduino_SSCMA.h>
#include "DFRobot_SpeechSynthesis.h"

#define THRESHOLD 0.6
DFRobot_SpeechSynthesis_I2C ss;
SSCMA AI;

void setup()
{
  //initial and setup Grove Vision 2
  AI.begin();

  //initial and setup Speech Synthesis module
  ss.begin(); 
  //set speech volume.
  ss.setVolume(5);
  //set speech speed. 
  ss.setSpeed(5);
  //set speech type female.
  ss.setSoundType(ss.eFemale1);
  //set speech tone.
  ss.setTone(5);
  //set speech in english word mode.
  ss.setLanguage(ss.eEnglishl);

  //initial serial for debug and monitor
  Serial.begin(9600);

  //message application started
  Serial.print("Object Detect App start!");

}

void loop()
{

  String analysis = "";
  // invoke once, no filter , get image
  if (!AI.invoke(1, false, true))
  {
    analysis += String("Detected ");
    for (int i = 0; i < AI.boxes().size(); i++)
    {
      if(AI.boxes()[i].score>THRESHOLD)
      {
        analysis += String(i);
        analysis += String(" target is ");
        analysis += String(AI.boxes()[i].target);
        analysis += String(" score is ");
        analysis += String(AI.boxes()[i].score);
        analysis += String(" position is ");
        analysis += String(AI.boxes()[i].x+AI.boxes()[i].w/2);
        analysis += String(" ");
        analysis += String(AI.boxes()[i].y+AI.boxes()[i].h/2);
      } 
    }
    ss.speak(analysis.c_str());
    
  }

  //interval 3 seconds.
  delay(3000);
}

Credits

Min Ma

8 projects • 1 follower

Senior Software Engineer

Contact

Comments

Please log in or sign up to comment.

Smart Object Detect

Things used in this project

Hardware components

Software apps and online services

Story

1.Project Planning

2.Part Sourcing

3.Building

4.Coding and Algorithm

Code

TrafficSensorNet

Object detection

Credits

Min Ma

Comments

Embed the widget on your own site

Smart Object Detect

Smart Object Detect

Things used in this project

Hardware components

Software apps and online services

Story

1.Project Planning

2.Part Sourcing

3.Building

4.Coding and Algorithm

Code

TrafficSensorNet

Object detection

Credits

Min Ma

Comments