Created September 4, 2024

Traffice Sense Tip with Vision AI

Using visual AI technology and TTS technology to help visually impaired people perceive traffic obstacles.

Things used in this project

Hardware components

Seeed Studio Vision AI 2

Seeed Studio OV5647-62 FOV Camera

Seeed Studio XIAO ESP32S3

DFRobot Gravity Text to Speech Voice Synthesizer Module

Story

1.Project Planning

With the development of computer vision technology, artificial intelligence has achieved remarkable results in target recognition. For people with visual impairments, road traffic safety is a prerequisite for freedom of movement. In outdoor environments, there may be cars, bicycles, motorcycles, pedestrians, etc. on the roads. For visually impaired individuals, these dynamic targets may become obstacles to mobility. Therefore, if visually impaired people can actively identify and avoid targets, the occurrence of collisions can be reduced. After determining the target and providing hardware resources for activities, we developed a project plan based on technical feasibility. The idea of the plan is to use computer vision to analyze road conditions, classify targets obtained from analysis into categories and directions, provide voice prompts for visually impaired people as reference for passing through, and interactively remind visually impaired people about obstacles ahead in front of the camera, reducing injuries caused by collisions. The specific plan is as follows: Users can obtain real-time road conditions through the camera worn on their head, which sends them to a computer vision model for analysis and provides information such as classification and quantity of targets ahead. The user can wear the camera on his or her head and align it with the front direction. By rotating the head to move the camera's view angle, users can adjust the detection direction to meet their flexibility requirements. The computer vision model analyzes real-time traffic conditions and sends the results to the backend processing module through a serial port. The backend processing module parses the results and reports road conditions to users via voice, achieving active obstacle avoidance functionality.

project components

2.Part Sourcing

1）Grove Vision AI 2

It is an MCU-based vision AI module powered by Arm Cortex-M55 & Ethos-U55. It supports TensorFlow and PyTorch frameworks and is compatible with Arduino IDE. With the SenseCraft AI algorithm platform, trained ML models can be deployed to the sensor without the need for coding. It features a standard CSI interface, an onboard digital microphone and an SD card slot, making it highly suitable for various embedded AI vision projects.

Grove VIsion 2

2) OV5647 Fov Camera

This camera module benefits from Fisheye Lens to achieve 62 Field of View, utilizing the OV5647 sensor to reach 2592 x 1944 active array size image show, supporting Raspberry Pi 3B+4B.

OV5647 camera

3）Xiao ESP32 module

Seeed Studio XIAO Series are diminutive development boards, sharing a similar hardware structure, where the size is literally thumb-sized. The code name "XIAO" here represents its half feature "Tiny", and the other half will be "Puissant". Seeed Studio XIAO ESP32S3 Sense integrates camera sensor, digital microphone and SD card supporting. Combining embedded ML computing power and photography capability, this development board can be your great tool to get started with intelligent voice and vision AI.

Xiao Esp32 module

4）Gravity Text-to-Speech module

The Text-to-Speech module enables easy integration of voice functionality into various projects. It supports both Chinese and English languages, providing clear and natural pronunciation. The module can also broadcast the current time and environmental data, and when combined with a speech recognition module, it enables conversational interactions with your projects. The module uses I2C and UARTcommunication modes, with a gravity interface, ensuring compatibility with most main controllers on the market, such as Arduino, micro:bit, Firebeetle Series, LattePanda, and Raspberry Pi. Additionally, it includes a built-in speaker, eliminating the need for an external one. The primary applications for this module include robot voice, voice broadcast, voice prompt, and text reading.

Gravity Text-to-Speech module

3.Building

1）step1: Select Model and Deploy

The project need create a model to analyze realtime road image and get car, bicycle, passenger infos. Building Model requires Label Datasets, Train Datasets, Upload Model.

Labelled Datasets focuses on how to obtain datasets that can be trained into models. There are two main ways to do this. The first is to use the labelled datasets provided by the Roboflow community, and the other is to use your own scenario-specific images as datasets, but you need to manually go through the labelling yourself. Training Dataset Exported Model focuses on how to train to get a model that can be deployed to Grove Vision AI V2 based on the dataset obtained in the first step, by using the Google Colab platform. Upload models via SenseCraft Model Assistant focurs how to use the exported model file to upload the model to Grove Vision AI V2 using the SenseCraft Model Assistant.

Luckly, we can find many models trained in SenseCraft AI Model Assistant.

SenseCraft AI Model website

search and find model Trafficcamnet, The TrafficCamNet model detects one or more physical objects from four categories within an image and returns a box around each object, as well as a category label for each object. The model is based on NVIDIA DetectNet_v2 detector with ResNet18 as a feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class. The raw normalized bounding-box and confidence detections needs to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce final bounding-box coordinates and category labels. The model can recognize car, bicycle, person and road sign four categories. The TrafficCamNet model has been trained and is ready to be download. so connect your device and download this model to your Grove Vision 2 device.

Trafficcamnet inference result

2) step2:Connect Analysis module

Connect the Grove Vision AI (WE2) module to the default I2C interface of your Arduino board(Xiao Esp32 module) using the 4-Pin Cable. Make sure each wire is connected to the correct pin.

SCL -> SCL (Grove Vision AI WE2)
SDA -> SDA (Grove Vision AI WE2)
VCC -> VCC (Grove Vision AI WE2, 3.3V)
GND -> GND (Grove Vision AI WE2)

Xiao esp32 module will receive Grove Vision 2 board message through I2C protocol， then create analysis report and send it to TTS module realtime.

3) step3:Connect TTS module

Connect Xiao Esp32 module with Gravity Text-to-speech module with UART or I2C (see figure below, pay attention there is a switch control mode UART or I2C, be sure that mode is correct with connection)，then the analysis result can be heard through speech voice.

two mode connection of TTS

4.Coding and Algorithm

1）Receive Grove Vision2 message， analysis and report.

First, install depended library:

Seeed_Arduino_SSCMA v1.0.0
ArduinoJson v7.1.0
DFRobot_SpeechSynthesis v1.0.1

As we only focus on forward direction, so we strict the inference result from left 80 to right 140，these params can be adjust in pratice. The result will report target and score. If report continue，seems too noisy, so give delay 3 seconds to reduce report frequence.

#include <Arduino.h>
#include <Seeed_Arduino_SSCMA.h>
#include "DFRobot_SpeechSynthesis.h"
//define the forward direction area between 80-140
#define LEFT 80
#define RIGHT 140
DFRobot_SpeechSynthesis_I2C ss;
SSCMA AI;
void setup()
{
    //initial and setup Grove Vision 2
    AI.begin();
    //initial and setup Speech Synthesis module
    ss.begin();
    //set speech volume.
    ss.setVolume(5);
    //set speech speed.
    ss.setSpeed(5);
    //set speech type female.
    ss.setSoundType(ss.eFemale1);
    //set speech tone.
    ss.setTone(5);
    //set speech in english word mode.
    ss.setLanguage(ss.eEnglishl);
    //initial serial for debug and monitor
    Serial.begin(9600);
    //message application started
    Serial.print("Traffic Sensor App start!");
}
void loop()
{
    String analysis = "";
    // invoke once, no filter , get image
    if (!AI.invoke(1, false, true))
    {
        analysis += String("Detected ");
        for (int i = 0; i < AI.boxes().size(); i++)
        {
            if(AI.boxes()[i].x > LEFT && AI.boxes()[i].x < RIGHT)
            {
                analysis += String(i);
                analysis += String(" target is ");
                analysis += String(AI.boxes()[i].target);
                analysis += String(" score is ");
                analysis += String(AI.boxes()[i].score);
            }
        }
        ss.speak(analysis.c_str());
    }
    //interval 3 seconds.
    delay(3000);
}

Traffic Sense App

#include <Arduino.h>
#include <Seeed_Arduino_SSCMA.h>
#include "DFRobot_SpeechSynthesis.h"
//define the forward direction area between 80-140
#define LEFT 80
#define RIGHT 140
DFRobot_SpeechSynthesis_I2C ss;
SSCMA AI;
void setup()
{
    //initial and setup Grove Vision 2
    AI.begin();
    //initial and setup Speech Synthesis module
    ss.begin();
    //set speech volume.
    ss.setVolume(5);
    //set speech speed.
    ss.setSpeed(5);
    //set speech type female.
    ss.setSoundType(ss.eFemale1);
    //set speech tone.
    ss.setTone(5);
    //set speech in english word mode.
    ss.setLanguage(ss.eEnglishl);
    //initial serial for debug and monitor
    Serial.begin(9600);
    //message application started
    Serial.print("Traffic Sensor App start!");
}
void loop()
{
    String analysis = "";
    // invoke once, no filter , get image
    if (!AI.invoke(1, false, true))
    {
        analysis += String("Detected ");
        for (int i = 0; i < AI.boxes().size(); i++)
        {
            if(AI.boxes()[i].x > LEFT && AI.boxes()[i].x < RIGHT)
            {
                analysis += String(i);
                analysis += String(" target is ");
                analysis += String(AI.boxes()[i].target);
                analysis += String(" score is ");
                analysis += String(AI.boxes()[i].score);
            }
        }
        ss.speak(analysis.c_str());
    }
    //interval 3 seconds.
    delay(3000);
}

Credits

michael ma

6 projects • 2 followers

Traffice Sense Tip with Vision AI

Things used in this project

Hardware components

Story

1.Project Planning

2.Part Sourcing

3.Building

4.Coding and Algorithm

Code

Traffic Sense App

Credits

michael ma

Comments

Embed the widget on your own site

Traffice Sense Tip with Vision AI

Traffice Sense Tip with Vision AI

Things used in this project

Hardware components

Story

1.Project Planning

2.Part Sourcing

3.Building

4.Coding and Algorithm

Code

Traffic Sense App

Credits

michael ma

Comments