Globally, Around 2.2 Billion people don’t have the capability to see, and 90% of them come from low-income countries. So, an easily accessible and low-cost solution is very important for the visually impaired people of these low-income countries.
Visual Impaired humans cannot perceive their environment and navigate like normal humans which results in reduced mobility. In this project, I will show how we can use artificial intelligence and computer vision to solve the problem. With the implementation of this project, the blind can now be less dependent on their current environment and people.
In this project, I have included object detection and text-to-speech conversion to explain the environment to a visually impaired. A blind person can hear the converted speech using his earphone.
Developing Object Detection Model using Edge Impulse StudioI used the Edge Impulse Studio to train the object detection model. Edge Impulse is a leading development platform for machine learning on edge devices.
To start a project you need to enter your account credentials (or create a free account) at Edge Impulse. Then you are ready to create a new project. Data is the main fuel for any machine-learning project.
In Edge Impulse you can upload your previous data or you can record your new data. For my project, I prepared a dataset for a few common objects available inside our house like chairs, tables, beds, and basins. The more objects we can include in the dataset the more effective the model will be. The size of the dataset is also important. The more images we can take of a particular object, the better accuracy we can expect.
I uploaded 188 images for 6 objects for my initial project. I will upload more images with more objects later. The data can be uploaded and labeled from the Data acquisition tab of Edge Impulse Studio. You can leave for the Studio to split your data automatically between Train and Test or do it manually.
After uploading and labeling of data, the next step is to design an Impulse. An impulse takes raw data (in this case, images), extracts features (resize pictures), and then uses a learning block to classify new data.
In this phase, you should define how to:
- Pre-processing consists of resizing the individual images from
320 x 240
to96 x 96
and squashing them (squared form, without cropping). - Design a Model, in this case, you need to add "Object Detection."
The complete Impulse will look like the following.
After saving the Impulse the Studio moves automatically to the next section, Generate features
, where all samples will be pre-processed, resulting in a dataset with individual 96x96x3 images or 12, 216 features.
Now, we will train our model. We need to set the Neural Network parameters from the settings option and click on the train button. The training process will take time based on the setting and the size of the dataset.
Increasing the size of the dataset demands more time to train with increased accuracy. Neural network parameters like training cycles and learning rate also influence the accuracy. I got the following result after several trials. It was taken around 10 minutes to generate the following result for my dataset. Though, the result is not very satisfactory but is okay to test the project. Definitely, for practical application, we will add more sample images for usable accuracy.
For real-time detection of the objects (or inferencing), we need to upload the model to XIAO ESP32S3 Sense. Fortunately, from Edge Impulse we can download the model as an Arduino library that can be easily integrated or customized for developing firmware for edge devices supported by Arduino IDE.
So, let's download the Arduino library for our board. For doing so, select the Arduino Library and Quantized (int8) model, enable the EON Compiler on the Deploy Tab
, and press [Build]
.
Open your Arduino IDE, and under Sketch
, go to Include Library
and add.ZIP Library
. Select the file you download from Edge Impulse Studio, and that's it!
Under the Examples
tab on Arduino IDE, you should find a sketch code (esp32 > esp32_camera) under your project name.
Project link: https://studio.edgeimpulse.com/studio/503872/impulse/1/deployment
For providing the right camera connection you should change lines 39 to 55, which define the camera model and pins for XIAO ESP32S3 Sense, by the data related to our model. Copy and paste the below lines, replacing the lines 39-55:
#define PWDN_GPIO_NUM -1
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 21
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
#define Y9_GPIO_NUM 35
#define Y8_GPIO_NUM 34
#define Y7_GPIO_NUM 39
#define Y6_GPIO_NUM 36
#define Y5_GPIO_NUM 19
#define Y4_GPIO_NUM 18
#define Y3_GPIO_NUM 5
#define Y2_GPIO_NUM 4
#define VSYNC_GPIO_NUM 25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22
After updating camera configurations I tried to compile to code but it was not compiling. I got the following error message.
I tried to solve it in different ways and finally, I was able to compile it by downgrading the esp32 board manager version to 2.0.17. Then I uploaded the code to the board.
The XIAO ESP32S3 Sense detects objects from the surroundings and returns the object's name with position. The use of Raspberry Pi is to receive the object name and position through UART and convert the text to speech.
For example: refrigerator on the left,bed in front
I used Raspberry Pi 1 B here and the performance is satisfactory. After installing the OS to Raspberry Pi, I configured the audio control system and set the volume to 100%.
sudo raspi-config
Then I installed the free software package Festival to Pi. Festival, written by The Centre for Speech Technology Research in the UK, offers a framework for building speech synthesis systems. It offers full text-to-speech through a number of APIs: from shell level, via a command interpreter, as a C++ library, from Java, and an Emacs editor interface.
Install festival using the following command:
sudo apt-get install -y libasound2-plugins festival
After installing the festival I connected an audio amplifier and tested using the following and the sound was amazing.
echo "Hello World!" | festival --tts
Then, I installed the python serial module to Raspberry Pi.
I connected the XIAO ESP32S3 Sense to the Raspberry Pi through a USB-C cable.
Finally, I attached a headphone through the audio out port of the Raspberry Pi.
Writing Code for Raspberry Pi
Before writing code we need to know the serial port number of the XIAO Sense board.
Once you have connected your XIAO Sense board up and it is plugged into the Raspberry Pi, we can run the following command in terminal.
dmesg | grep tty
The result:
Now we know the serial port number. It's time to write code. I wrote the following code for Raspberry Pi to convert the received text to voice.
#!/usr/bin/env python
import time
import serial
import os
ser = serial.Serial(
port='/dev/ttyACM0',
baudrate = 115200,
parity=serial.PARITY_NONE,
stopbits=serial.STOPBITS_ONE,
bytesize=serial.EIGHTBITS,
timeout=1
)
while True:
receive_msg=ser.readline()
print(receive_msg)
if b'basin' in receive_msg.lower():
os.system('echo "basin in front" | festival --tts')
if b'bed' in receive_msg.lower():
os.system('echo "bed in front" | festival --tts')
if b'chair' in receive_msg.lower():
os.system('echo "chair in front" | festival --tts')
if b'dining table' in receive_msg.lower():
os.system('echo "dining tabl in front" | festival --tts')
if b'oven' in receive_msg.lower():
os.system('echo "oven in front" | festival --tts')
if b'refrigerator' in receive_msg.lower():
os.system('echo "refrigerator in front" | festival --tts')
The Final Setup
The XIAO ESP32S3 Sense board will get power from the Raspberry Pi. We can use a power bank to power the Pi.
Comments