Introduction
Developing Object Detection Model using Edge Impulse Studio
The Impulse Design
Deploying the Model (using Arduino IDE
Configuring Raspberry Pi

Published August 23, 2024 © MIT

Third Eye for Blind

An AI-based assistive tool for blind people for the edge of easy movement.

AdvancedFull instructions provided10 hours4,008

Lucky Draw for TWO Submissions

Build2gether 2.0 — Inclusive Innovation Challenge

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Sense

Raspberry Pi 1 Model B+

Rechargeable Battery, Lithium Ion

Software apps and online services

Edge Impulse Studio

Arduino IDE

Raspberry Pi Raspbian

Story

Introduction

Globally, Around 2.2 Billion people don’t have the capability to see, and 90% of them come from low-income countries. So, an easily accessible and low-cost solution is very important for the visually impaired people of these low-income countries.

Visual Impaired humans cannot perceive their environment and navigate like normal humans which results in reduced mobility. In this project, I will show how we can use artificial intelligence and computer vision to solve the problem. With the implementation of this project, the blind can now be less dependent on their current environment and people.

In this project, I have included object detection and text-to-speech conversion to explain the environment to a visually impaired. A blind person can hear the converted speech using his earphone.

Developing Object Detection Model using Edge Impulse Studio

I used the Edge Impulse Studio to train the object detection model. Edge Impulse is a leading development platform for machine learning on edge devices.

To start a project you need to enter your account credentials (or create a free account) at Edge Impulse. Then you are ready to create a new project. Data is the main fuel for any machine-learning project.

In Edge Impulse you can upload your previous data or you can record your new data. For my project, I prepared a dataset for a few common objects available inside our house like chairs, tables, beds, and basins. The more objects we can include in the dataset the more effective the model will be. The size of the dataset is also important. The more images we can take of a particular object, the better accuracy we can expect.

I uploaded 188 images for 6 objects for my initial project. I will upload more images with more objects later. The data can be uploaded and labeled from the Data acquisition tab of Edge Impulse Studio. You can leave for the Studio to split your data automatically between Train and Test or do it manually.

The Impulse Design

After uploading and labeling of data, the next step is to design an Impulse. An impulse takes raw data (in this case, images), extracts features (resize pictures), and then uses a learning block to classify new data.

In this phase, you should define how to:

Pre-processing consists of resizing the individual images from 320 x 240 to 96 x 96 and squashing them (squared form, without cropping).
Design a Model, in this case, you need to add "Object Detection."

The complete Impulse will look like the following.

After saving the Impulse the Studio moves automatically to the next section, Generate features, where all samples will be pre-processed, resulting in a dataset with individual 96x96x3 images or 12, 216 features.

Now, we will train our model. We need to set the Neural Network parameters from the settings option and click on the train button. The training process will take time based on the setting and the size of the dataset.

Increasing the size of the dataset demands more time to train with increased accuracy. Neural network parameters like training cycles and learning rate also influence the accuracy. I got the following result after several trials. It was taken around 10 minutes to generate the following result for my dataset. Though, the result is not very satisfactory but is okay to test the project. Definitely, for practical application, we will add more sample images for usable accuracy.

Deploying the Model (using Arduino IDE)

For real-time detection of the objects (or inferencing), we need to upload the model to XIAO ESP32S3 Sense. Fortunately, from Edge Impulse we can download the model as an Arduino library that can be easily integrated or customized for developing firmware for edge devices supported by Arduino IDE.

So, let's download the Arduino library for our board. For doing so, select the Arduino Library and Quantized (int8) model, enable the EON Compiler on the Deploy Tab, and press [Build].

Open your Arduino IDE, and under Sketch, go to Include Library and add.ZIP Library. Select the file you download from Edge Impulse Studio, and that's it!

Under the Examples tab on Arduino IDE, you should find a sketch code (esp32 > esp32_camera) under your project name.

Project link: https://studio.edgeimpulse.com/studio/503872/impulse/1/deployment

For providing the right camera connection you should change lines 39 to 55, which define the camera model and pins for XIAO ESP32S3 Sense, by the data related to our model. Copy and paste the below lines, replacing the lines 39-55:

#define PWDN_GPIO_NUM    -1
#define RESET_GPIO_NUM   -1
#define XCLK_GPIO_NUM 21
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
#define Y9_GPIO_NUM 35
#define Y8_GPIO_NUM 34
#define Y7_GPIO_NUM 39
#define Y6_GPIO_NUM 36
#define Y5_GPIO_NUM 19
#define Y4_GPIO_NUM 18
#define Y3_GPIO_NUM 5
#define Y2_GPIO_NUM 4
#define VSYNC_GPIO_NUM 25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22

After updating camera configurations I tried to compile to code but it was not compiling. I got the following error message.

I tried to solve it in different ways and finally, I was able to compile it by downgrading the esp32 board manager version to 2.0.17. Then I uploaded the code to the board.

Configuring Raspberry Pi

The XIAO ESP32S3 Sense detects objects from the surroundings and returns the object's name with position. The use of Raspberry Pi is to receive the object name and position through UART and convert the text to speech.

For example: refrigerator on the left,bed in front

I used Raspberry Pi 1 B here and the performance is satisfactory. After installing the OS to Raspberry Pi, I configured the audio control system and set the volume to 100%.

sudo raspi-config

Then I installed the free software package Festival to Pi. Festival, written by The Centre for Speech Technology Research in the UK, offers a framework for building speech synthesis systems. It offers full text-to-speech through a number of APIs: from shell level, via a command interpreter, as a C++ library, from Java, and an Emacs editor interface.

Install festival using the following command:

sudo apt-get install -y libasound2-plugins festival

After installing the festival I connected an audio amplifier and tested using the following and the sound was amazing.

echo "Hello World!" | festival --tts

Then, I installed the python serial module to Raspberry Pi.

I connected the XIAO ESP32S3 Sense to the Raspberry Pi through a USB-C cable.

Finally, I attached a headphone through the audio out port of the Raspberry Pi.

Writing Code for Raspberry Pi

Before writing code we need to know the serial port number of the XIAO Sense board.

Once you have connected your XIAO Sense board up and it is plugged into the Raspberry Pi, we can run the following command in terminal.

dmesg | grep tty

The result:

1 / 2

Now we know the serial port number. It's time to write code. I wrote the following code for Raspberry Pi to convert the received text to voice.

#!/usr/bin/env python
import time
import serial
import os

ser = serial.Serial(
        port='/dev/ttyACM0',
        baudrate = 115200,
        parity=serial.PARITY_NONE,
        stopbits=serial.STOPBITS_ONE,
        bytesize=serial.EIGHTBITS,
        timeout=1
)

while True:
        receive_msg=ser.readline()
        print(receive_msg)
        if b'basin' in receive_msg.lower():
            os.system('echo "basin in front" | festival --tts')
        if b'bed' in receive_msg.lower():
            os.system('echo "bed in front" | festival --tts')
        if b'chair' in receive_msg.lower():
            os.system('echo "chair in front" | festival --tts')
        if b'dining table' in receive_msg.lower():
            os.system('echo "dining tabl in front" | festival --tts')
        if b'oven' in receive_msg.lower():
            os.system('echo "oven in front" | festival --tts')
        if b'refrigerator' in receive_msg.lower():
            os.system('echo "refrigerator in front" | festival --tts')

The Final Setup

The XIAO ESP32S3 Sense board will get power from the Raspberry Pi. We can use a power bank to power the Pi.

/* Edge Impulse Arduino examples
 * Copyright (c) 2022 EdgeImpulse Inc.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

// These sketches are tested with 2.0.4 ESP32 Arduino Core
// https://github.com/espressif/arduino-esp32/releases/tag/2.0.4

/* Includes ---------------------------------------------------------------- */
#include <eye_for_blind_inferencing.h>
#include "edge-impulse-sdk/dsp/image/image.hpp"

#include "esp_camera.h"

// Select camera model - find more camera models in camera_pins.h file here
// https://github.com/espressif/arduino-esp32/blob/master/libraries/ESP32/examples/Camera/CameraWebServer/camera_pins.h

#define CAMERA_MODEL_ESP_EYE // Has PSRAM
//#define CAMERA_MODEL_AI_THINKER // Has PSRAM

#if defined(CAMERA_MODEL_ESP_EYE)
#define PWDN_GPIO_NUM    -1
#define RESET_GPIO_NUM   -1
#define XCLK_GPIO_NUM    21
#define SIOD_GPIO_NUM    26
#define SIOC_GPIO_NUM    27

#define Y9_GPIO_NUM      35
#define Y8_GPIO_NUM      34
#define Y7_GPIO_NUM      39
#define Y6_GPIO_NUM      36
#define Y5_GPIO_NUM      19
#define Y4_GPIO_NUM      18
#define Y3_GPIO_NUM      5
#define Y2_GPIO_NUM      4
#define VSYNC_GPIO_NUM   25
#define HREF_GPIO_NUM    23
#define PCLK_GPIO_NUM    22

#elif defined(CAMERA_MODEL_AI_THINKER)
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27

#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

#else
#error "Camera model not selected"
#endif

/* Constant defines -------------------------------------------------------- */
#define EI_CAMERA_RAW_FRAME_BUFFER_COLS           320
#define EI_CAMERA_RAW_FRAME_BUFFER_ROWS           240
#define EI_CAMERA_FRAME_BYTE_SIZE                 3

/* Private variables ------------------------------------------------------- */
static bool debug_nn = false; // Set this to true to see e.g. features generated from the raw signal
static bool is_initialised = false;
uint8_t *snapshot_buf; //points to the output of the capture

static camera_config_t camera_config = {
    .pin_pwdn = PWDN_GPIO_NUM,
    .pin_reset = RESET_GPIO_NUM,
    .pin_xclk = XCLK_GPIO_NUM,
    .pin_sscb_sda = SIOD_GPIO_NUM,
    .pin_sscb_scl = SIOC_GPIO_NUM,

    .pin_d7 = Y9_GPIO_NUM,
    .pin_d6 = Y8_GPIO_NUM,
    .pin_d5 = Y7_GPIO_NUM,
    .pin_d4 = Y6_GPIO_NUM,
    .pin_d3 = Y5_GPIO_NUM,
    .pin_d2 = Y4_GPIO_NUM,
    .pin_d1 = Y3_GPIO_NUM,
    .pin_d0 = Y2_GPIO_NUM,
    .pin_vsync = VSYNC_GPIO_NUM,
    .pin_href = HREF_GPIO_NUM,
    .pin_pclk = PCLK_GPIO_NUM,

    //XCLK 20MHz or 10MHz for OV2640 double FPS (Experimental)
    .xclk_freq_hz = 20000000,
    .ledc_timer = LEDC_TIMER_0,
    .ledc_channel = LEDC_CHANNEL_0,

    .pixel_format = PIXFORMAT_JPEG, //YUV422,GRAYSCALE,RGB565,JPEG
    .frame_size = FRAMESIZE_QVGA,    //QQVGA-UXGA Do not use sizes above QVGA when not JPEG

    .jpeg_quality = 12, //0-63 lower number means higher quality
    .fb_count = 1,       //if more than one, i2s runs in continuous mode. Use only with JPEG
    .fb_location = CAMERA_FB_IN_PSRAM,
    .grab_mode = CAMERA_GRAB_WHEN_EMPTY,
};

/* Function definitions ------------------------------------------------------- */
bool ei_camera_init(void);
void ei_camera_deinit(void);
bool ei_camera_capture(uint32_t img_width, uint32_t img_height, uint8_t *out_buf) ;

/**
* @brief      Arduino setup function
*/
void setup()
{
    // put your setup code here, to run once:
    Serial.begin(115200);
    //comment out the below line to start inference immediately after upload
    while (!Serial);
    Serial.println("Edge Impulse Inferencing Demo");
    if (ei_camera_init() == false) {
        ei_printf("Failed to initialize Camera!\r\n");
    }
    else {
        ei_printf("Camera initialized\r\n");
    }

    ei_printf("\nStarting continious inference in 2 seconds...\n");
    ei_sleep(2000);
}

/**
* @brief      Get data and run inferencing
*
* @param[in]  debug  Get debug info if true
*/
void loop()
{

    // instead of wait_ms, we'll wait on the signal, this allows threads to cancel us...
    if (ei_sleep(5) != EI_IMPULSE_OK) {
        return;
    }

    snapshot_buf = (uint8_t*)malloc(EI_CAMERA_RAW_FRAME_BUFFER_COLS * EI_CAMERA_RAW_FRAME_BUFFER_ROWS * EI_CAMERA_FRAME_BYTE_SIZE);

    // check if allocation was successful
    if(snapshot_buf == nullptr) {
        ei_printf("ERR: Failed to allocate snapshot buffer!\n");
        return;
    }

    ei::signal_t signal;
    signal.total_length = EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT;
    signal.get_data = &ei_camera_get_data;

    if (ei_camera_capture((size_t)EI_CLASSIFIER_INPUT_WIDTH, (size_t)EI_CLASSIFIER_INPUT_HEIGHT, snapshot_buf) == false) {
        ei_printf("Failed to capture image\r\n");
        free(snapshot_buf);
        return;
    }

    // Run the classifier
    ei_impulse_result_t result = { 0 };

    EI_IMPULSE_ERROR err = run_classifier(&signal, &result, debug_nn);
    if (err != EI_IMPULSE_OK) {
        ei_printf("ERR: Failed to run classifier (%d)\n", err);
        return;
    }

    // print the predictions
    ei_printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n",
                result.timing.dsp, result.timing.classification, result.timing.anomaly);

#if EI_CLASSIFIER_OBJECT_DETECTION == 1
    ei_printf("Object detection bounding boxes:\r\n");
    for (uint32_t i = 0; i < result.bounding_boxes_count; i++) {
        ei_impulse_result_bounding_box_t bb = result.bounding_boxes[i];
        if (bb.value == 0) {
            continue;
        }
        ei_printf("  %s (%f) [ x: %u, y: %u, width: %u, height: %u ]\r\n",
                bb.label,
                bb.value,
                bb.x,
                bb.y,
                bb.width,
                bb.height);
    }

    // Print the prediction results (classification)
#else
    ei_printf("Predictions:\r\n");
    for (uint16_t i = 0; i < EI_CLASSIFIER_LABEL_COUNT; i++) {
        ei_printf("  %s: ", ei_classifier_inferencing_categories[i]);
        ei_printf("%.5f\r\n", result.classification[i].value);
    }
#endif

    // Print anomaly result (if it exists)
#if EI_CLASSIFIER_HAS_ANOMALY
    ei_printf("Anomaly prediction: %.3f\r\n", result.anomaly);
#endif

#if EI_CLASSIFIER_HAS_VISUAL_ANOMALY
    ei_printf("Visual anomalies:\r\n");
    for (uint32_t i = 0; i < result.visual_ad_count; i++) {
        ei_impulse_result_bounding_box_t bb = result.visual_ad_grid_cells[i];
        if (bb.value == 0) {
            continue;
        }
        ei_printf("  %s (%f) [ x: %u, y: %u, width: %u, height: %u ]\r\n",
                bb.label,
                bb.value,
                bb.x,
                bb.y,
                bb.width,
                bb.height);
    }
#endif


    free(snapshot_buf);

}

/**
 * @brief   Setup image sensor & start streaming
 *
 * @retval  false if initialisation failed
 */
bool ei_camera_init(void) {

    if (is_initialised) return true;

#if defined(CAMERA_MODEL_ESP_EYE)
  pinMode(13, INPUT_PULLUP);
  pinMode(14, INPUT_PULLUP);
#endif

    //initialize the camera
    esp_err_t err = esp_camera_init(&camera_config);
    if (err != ESP_OK) {
      Serial.printf("Camera init failed with error 0x%x\n", err);
      return false;
    }

    sensor_t * s = esp_camera_sensor_get();
    // initial sensors are flipped vertically and colors are a bit saturated
    if (s->id.PID == OV3660_PID) {
      s->set_vflip(s, 1); // flip it back
      s->set_brightness(s, 1); // up the brightness just a bit
      s->set_saturation(s, 0); // lower the saturation
    }

#if defined(CAMERA_MODEL_M5STACK_WIDE)
    s->set_vflip(s, 1);
    s->set_hmirror(s, 1);
#elif defined(CAMERA_MODEL_ESP_EYE)
    s->set_vflip(s, 1);
    s->set_hmirror(s, 1);
    s->set_awb_gain(s, 1);
#endif

    is_initialised = true;
    return true;
}

/**
 * @brief      Stop streaming of sensor data
 */
void ei_camera_deinit(void) {

    //deinitialize the camera
    esp_err_t err = esp_camera_deinit();

    if (err != ESP_OK)
    {
        ei_printf("Camera deinit failed\n");
        return;
    }

    is_initialised = false;
    return;
}


/**
 * @brief      Capture, rescale and crop image
 *
 * @param[in]  img_width     width of output image
 * @param[in]  img_height    height of output image
 * @param[in]  out_buf       pointer to store output image, NULL may be used
 *                           if ei_camera_frame_buffer is to be used for capture and resize/cropping.
 *
 * @retval     false if not initialised, image captured, rescaled or cropped failed
 *
 */
bool ei_camera_capture(uint32_t img_width, uint32_t img_height, uint8_t *out_buf) {
    bool do_resize = false;

    if (!is_initialised) {
        ei_printf("ERR: Camera is not initialized\r\n");
        return false;
    }

    camera_fb_t *fb = esp_camera_fb_get();

    if (!fb) {
        ei_printf("Camera capture failed\n");
        return false;
    }

   bool converted = fmt2rgb888(fb->buf, fb->len, PIXFORMAT_JPEG, snapshot_buf);

   esp_camera_fb_return(fb);

   if(!converted){
       ei_printf("Conversion failed\n");
       return false;
   }

    if ((img_width != EI_CAMERA_RAW_FRAME_BUFFER_COLS)
        || (img_height != EI_CAMERA_RAW_FRAME_BUFFER_ROWS)) {
        do_resize = true;
    }

    if (do_resize) {
        ei::image::processing::crop_and_interpolate_rgb888(
        out_buf,
        EI_CAMERA_RAW_FRAME_BUFFER_COLS,
        EI_CAMERA_RAW_FRAME_BUFFER_ROWS,
        out_buf,
        img_width,
        img_height);
    }


    return true;
}

static int ei_camera_get_data(size_t offset, size_t length, float *out_ptr)
{
    // we already have a RGB888 buffer, so recalculate offset into pixel index
    size_t pixel_ix = offset * 3;
    size_t pixels_left = length;
    size_t out_ptr_ix = 0;

    while (pixels_left != 0) {
        // Swap BGR to RGB here
        // due to https://github.com/espressif/esp32-camera/issues/379
        out_ptr[out_ptr_ix] = (snapshot_buf[pixel_ix + 2] << 16) + (snapshot_buf[pixel_ix + 1] << 8) + snapshot_buf[pixel_ix];

        // go to the next pixel
        out_ptr_ix++;
        pixel_ix+=3;
        pixels_left--;
    }
    // and done!
    return 0;
}

#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_CAMERA
#error "Invalid model for current sensor"
#endif

#!/usr/bin/env python
import time
import serial
import os

ser = serial.Serial(
        port='/dev/ttyACM0',
        baudrate = 115200,
        parity=serial.PARITY_NONE,
        stopbits=serial.STOPBITS_ONE,
        bytesize=serial.EIGHTBITS,
        timeout=1
)

while True:
        receive_msg=ser.readline()
        print(receive_msg)
        if b'basin' in receive_msg.lower():
            os.system('echo "basin in front" | festival --tts')
        if b'bed' in receive_msg.lower():
            os.system('echo "bed in front" | festival --tts')
        if b'chair' in receive_msg.lower():
            os.system('echo "chair in front" | festival --tts')
        if b'dining table' in receive_msg.lower():
            os.system('echo "dining tabl in front" | festival --tts')
        if b'oven' in receive_msg.lower():
            os.system('echo "oven in front" | festival --tts')
        if b'refrigerator' in receive_msg.lower():
            os.system('echo "refrigerator in front" | festival --tts')