Introduction
Building the Solution with Seeed Studio XIAO ESP32S3
Hardware Build
Wiring Connections
Developing the System
Software Setup
User Guidance and Feedback
Final Integration

Published September 4, 2024

Reading Aid for the Visually Impaired Using XIAO ESP32S3

A compact, real-time reading aid for the visually impaired that converts printed text to speech using the Seeed Studio XIAO ESP32S3.

AdvancedProtip482

Reading Aid for the Visually Impaired Using XIAO ESP32S3

Things used in this project

Hardware components

Seeed Studio XIAO ESP32S3 Sense

Adafruit MAX98357A Amplifier

DIY Speaker

Software apps and online services

Arduino IDE

Node.js

Tesseract.js

Google Text-to-Speech (TTS)

Story

Introduction

Globally, approximately 2.2 billion people suffer from visual impairments, with 90% residing in low-income countries. For these individuals, affordable and accessible solutions are crucial. In this project, I demonstrate how computer vision can significantly improve accessibility by enabling visually impaired individuals to read independently.

The challenge faced by visually impaired individuals is their limited ability to perceive and navigate their surroundings, which can restrict their mobility. To address this, I developed a system that combines object detection and text-to-speech conversion to provide real-time reading assistance. The result is a device that helps visually impaired users understand their environment through audio feedback.

Building the Solution with Seeed Studio XIAO ESP32S3

The heart of this project is the Seeed Studio XIAO ESP32S3 microcontroller, chosen for its compact size and powerful processing capabilities. This device is integrated with several key components to bring the project to life.

The OV2640 camera sensor is used to capture high-resolution images of printed text. This visual data is then processed to extract the text content.

Capturing Text with the OV2640 Camera SensorThe OV2640 camera sensor is used to capture high-resolution images of printed text. This visual data is then processed to extract the text content.
Text Recognition with Tesseract.jsTesseract.js is employed to perform Optical Character Recognition (OCR) on the captured images. This software converts visual text into readable digital text and supports more than 100 languages.
Text-to-Speech Conversion with Google TTSThe recognized text is converted into natural-sounding speech using Google Text-to-Speech (TTS), providing clear audio output for the user. Google's Text-to-Speech (TTS) API supports over 50 languages and variants, with a selection of more than 380 voices.
Audio Enhancement with MAX98357A Amplifier and DIY SpeakerThe MAX98357A amplifier boosts the audio signal, which is then played through a DIY speaker, ensuring the speech is loud and clear.

Hardware Build

The prototype is built using the following hardware components: the Seeed Studio XIAO ESP32S3 microcontroller, a MAX98357A I2S digital-to-analog converter (DAC) and amplifier, and a speaker. Let's go through the schematic step-by-step to understand how these components are connected:

Circuit Diagram

Wiring Connections

Here is how the components are connected according to the schematic:

Power Connections:5V (VCC) from the Seeed Studio XIAO ESP32S3 is connected to the VCC pin of the MAX98357A. GND (Ground) from the Seeed Studio XIAO ESP32S3 is connected to the GND pin of the MAX98357A to establish a common ground between both components.
I2S Data Connections:The BCLK (Bit Clock) pin on the MAX98357A is connected to the SCK (Serial Clock) pin on the XIAO ESP32S3. This pin transmits the clock signal for the digital audio data. The LRC (Left-Right Clock) pin on the MAX98357A is connected to the LRCK (Word Select) pin on the XIAO ESP32S3. This pin indicates whether the audio data corresponds to the left or right channel. The DIN (Data In) pin on the MAX98357A is connected to the SD (Serial Data) pin on the XIAO ESP32S3. This pin is used to transmit the digital audio data from the microcontroller to the DAC.
Speaker Connections:The + and - terminals of the speaker are connected to the + and - output pins on the MAX98357A amplifier, respectively. This connection allows the amplified analog audio signal to be output to the speaker.

Developing the System

Programming and IntegrationThe project begins with programming the microcontroller using Embedded C and integrating it with Node.js for managing data flow. Node.js sends captured images to Tesseract.js and receives the extracted text, which is then sent to the Arduino IDE.
Creating and Deploying the ModelThe captured text is processed using Tesseract.js to ensure accurate recognition. The model is then deployed onto the XIAO ESP32S3 microcontroller. For deployment, the Arduino IDE is used to upload the firmware that integrates all components.
Testing and CalibrationThe device is tested to ensure accurate text recognition and clear audio output. Adjustments are made to improve performance and accuracy based on testing results.

Software Setup

To get the device up and running, follow the software setup instructions below. This setup involves configuring the Arduino IDE, integrating necessary libraries, setting up the OCR processing with Tesseract.js, and installing Google Text-to-Speech for audio output.

1. Installing Arduino IDE and Required Libraries

Download and Install Arduino IDE:Visit the Arduino website and download the latest version of the Arduino IDE suitable for your operating system. Install the software following the instructions provided on the website.
Install the ESP32 Board Package:Open the Arduino IDE, go to File > Preferences, and add the following URL to the Additional Board Manager URLs field:

https://dl.espressif.com/dl/package_esp32_index.json

Next, navigate to Tools > Board > Boards Manager, search for "ESP32, " and install the package.
Add Required Libraries:Go to Sketch > Include Library > Manage Libraries..., and search for and install ArduinoJson library

2. Setting Up Node.js for OCR

Tesseract.js is a powerful JavaScript OCR engine that will run on the Node.js environment. To set it up:

Install Node.js and npm (Node Package Manager):If you haven't already, download and install Node.js from the official website. This installation will include npm (Node Package Manager), which is needed to install dependencies.
Initialize a Node.js Project:Open a terminal or command prompt, navigate to your project directory, and run:

mkdir ocr-server
cd ocr-server
npm init -y

Install Required Node.js Modules: Install express for creating the server and tesseract.js for OCR processing:

npm install express tesseract.js

3. Writing the Code

Open the Arduino IDE and create a new sketch. In the new sketch, add the code.
Create a new file named server.js in the ocr-server directory and add the server code for ocr processing.
To start the OCR server, run the following command in your terminal or command prompt:

node server.js

You should see a message indicating that the server is running

OCR server is running at http://localhost:3000

Connect your microcontroller to the computer via a USB cable, select the appropriate board and port from the Tools menu, and click the upload button.
Open the Serial Monitor in the Arduino IDE (Tools > Serial Monitor) to check for errors or debug messages.

Output from Node.js Server

User Guidance and Feedback

The device not only converts printed text into speech but also provides real-time feedback to help users position the device correctly for optimal performance. If the text is not readable due to improper positioning or distance, the device will give an audio prompt such as:

"Please keep the device at least 15 cm above the text."

This ensures that the camera can focus properly and capture a clear image of the text. The voice command helps users adjust the device to the correct height and angle, minimizing errors in text recognition and improving overall user experience.

Output of User guidance feature

Working of the prototype

Final Integration

The completed system combines the XIAO ESP32S3 microcontroller, camera sensor, text recognition software, and audio components into a cohesive device. Users can now point the camera at printed text, which is instantly converted to speech, allowing them to read the content. Additionally, with the built-in guidance system, users receive instant feedback to ensure the device is positioned correctly for accurate text capture.

This project represents a significant step forward in creating accessible technology for the visually impaired. By providing an affordable, real-time reading aid with user guidance, it offers a powerful tool for enhancing independence and improving quality of life.

Code

#include <WiFi.h>
#include <WiFiClientSecure.h>
#include <ArduinoJson.h>
#include "esp_camera.h"
#include "soc/soc.h"
#include "soc/rtc_cntl_reg.h"
#include <Arduino.h>
#include <HTTPClient.h>
#include "Audio.h"

// WiFi credentials
const char* ssid = "";
const char* password = "";

#define CAMERA_MODEL_XIAO_ESP32S3
#include "camera_pins.h"

// Server URL for uploading images and receiving text
const char* serverUrl = "";

#define I2S_DOUT   D1  
#define I2S_BCLK   D2  
#define I2S_LRC    D0  

Audio audio;

void setup() {
  Serial.begin(115200);
  Serial.setDebugOutput(true);
  Serial.println();

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sccb_sda = SIOD_GPIO_NUM;
  config.pin_sccb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.frame_size = FRAMESIZE_UXGA;
  config.pixel_format = PIXFORMAT_JPEG;
  config.grab_mode = CAMERA_GRAB_WHEN_EMPTY;
  config.fb_location = CAMERA_FB_IN_PSRAM;
  config.jpeg_quality = 12;
  config.fb_count = 1;

  if (config.pixel_format == PIXFORMAT_JPEG) {
    if (psramFound()) {
      Serial.println("Using PSRAM");
      config.jpeg_quality = 10;
      config.fb_count = 2;
      config.grab_mode = CAMERA_GRAB_LATEST;
    } else {
      config.frame_size = FRAMESIZE_UXGA;
      config.fb_location = CAMERA_FB_IN_DRAM;
    }
  } else {
    config.frame_size = FRAMESIZE_240X240;
#if CONFIG_IDF_TARGET_ESP32S3
    config.fb_count = 2;
#endif
  }

  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed with error 0x%x", err);
    return;
  }

  WiFi.mode(WIFI_STA);
  WiFi.begin(ssid, password);
  delay(1000);
  Serial.println("Connecting to WiFi...");
  long int StartTime = millis();
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    if ((StartTime + 10000) < millis()) break;
  }
  if (WiFi.status() == WL_CONNECTED) {
    Serial.println("Connected to WiFi");
    Serial.print("IP address: ");
    Serial.println(WiFi.localIP());
  } else {
    Serial.println("Failed to connect to WiFi");
    return;
  }

  audio.setPinout(I2S_BCLK, I2S_LRC, I2S_DOUT);
  audio.setVolume(100);

  playDetectedText("Welcome to the reader for the blind.");
}

void loop() {
  String detectedText = captureAndUploadImage();
  if (detectedText.length() > 0) {
    playDetectedText(detectedText);
  } else {
    playDetectedText("Please keep the device at least 15 cm above the text.");
  }
  delay(5000); 
}

String captureAndUploadImage() {
  camera_fb_t *fb = esp_camera_fb_get();
  if (!fb) {
    Serial.println("Failed to capture image");
    return "";
  }

  Serial.printf("Captured image of size: %u bytes\n", fb->len);
  Serial.printf("Free heap before sending: %d bytes\n", ESP.getFreeHeap());

  HTTPClient http;
  http.begin(serverUrl);
  http.setTimeout(15000); 

  String boundary = "----WebKitFormBoundary7MA4YWxkTrZu0gW";
  String contentType = "multipart/form-data; boundary=" + boundary;
  http.addHeader("Content-Type", contentType);

  String bodyStart = "--" + boundary + "\r\n";
  bodyStart += "Content-Disposition: form-data; name=\"file\"; filename=\"image.jpg\"\r\n";
  bodyStart += "Content-Type: image/jpeg\r\n\r\n";

  String bodyEnd = "\r\n--" + boundary + "--\r\n";

  size_t bodySize = bodyStart.length();
  size_t bodyEndSize = bodyEnd.length();
  size_t totalSize = bodySize + fb->len + bodyEndSize;

  uint8_t* bodyBuffer = (uint8_t*)malloc(totalSize);
  if (bodyBuffer == NULL) {
    Serial.println("Failed to allocate memory for body buffer");
    esp_camera_fb_return(fb);
    return "";
  }

  memcpy(bodyBuffer, bodyStart.c_str(), bodySize);
  memcpy(bodyBuffer + bodySize, fb->buf, fb->len);
  memcpy(bodyBuffer + bodySize + fb->len, bodyEnd.c_str(), bodyEndSize);

  int httpResponseCode = http.sendRequest("POST", bodyBuffer, totalSize);
  String response = "";
  if (httpResponseCode > 0) {
    response = http.getString();
    Serial.println("Server response: " + response);
  } else {
    Serial.printf("Error on HTTP request: %s\n", http.errorToString(httpResponseCode).c_str());
  }

  free(bodyBuffer);
  http.end();
  esp_camera_fb_return(fb);
  Serial.printf("Free heap after sending: %d bytes\n", ESP.getFreeHeap());
  return extractTextFromResponse(response);
}

String extractTextFromResponse(String response) {
  StaticJsonDocument<1024> doc;
  deserializeJson(doc, response);
  String detectedText = doc["detectedText"];
  return detectedText;
}

void playDetectedText(String text) {
  Serial.println("Playing detected text: " + text);
  audio.connecttospeech(text.c_str(), "en"); // Google TTS
  while (audio.isRunning()) {
    audio.loop();
  }
}

void audio_info(const char *info) {
  Serial.print("audio_info: "); Serial.println(info);
}

const express = require('express');
const multer = require('multer');
const path = require('path');
const fs = require('fs');
const Tesseract = require('tesseract.js');

const app = express();
const port = 3000;

const storage = multer.diskStorage({
    destination: (req, file, cb) => {
        cb(null, 'uploads/');
    },
    filename: (req, file, cb) => {
        cb(null, Date.now() + path.extname(file.originalname));
    }
});

const upload = multer({ storage: storage });

if (!fs.existsSync('uploads')) {
    fs.mkdirSync('uploads');
}

app.use('/uploads', express.static(path.join(__dirname, 'uploads')));

app.post('/upload', upload.single('file'), (req, res) => {
    if (!req.file) {
        console.error('No file uploaded');
        return res.status(400).json({ error: 'No file uploaded' });
    }

    const filePath = path.join(__dirname, req.file.path);

    Tesseract.recognize(filePath, 'eng', { logger: info => console.log(info) })
        .then(({ data: { text } }) => {
            console.log('Recognized text:', text);
            res.json({ recognizedText: text });
            fs.unlinkSync(filePath);
        })
        .catch(error => {
            console.error('Error during OCR:', error);
            res.status(500).json({ error: 'OCR processing failed' });
        });
});

app.listen(port, () => {
    console.log(`Server is running at http://localhost:${port}`);
});

Reading Aid for the Visually Impaired Using XIAO ESP32S3

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Building the Solution with Seeed Studio XIAO ESP32S3

Hardware Build

Wiring Connections

Developing the System

Software Setup

User Guidance and Feedback

Final Integration

Schematics

Circuit diagram

Code

Aruduino IDE code

Server code

Credits

Bharath Ram

Comments

Embed the widget on your own site

Reading Aid for the Visually Impaired Using XIAO ESP32S3

Reading Aid for the Visually Impaired Using XIAO ESP32S3

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Building the Solution with Seeed Studio XIAO ESP32S3

Hardware Build

Wiring Connections

Developing the System

Software Setup

User Guidance and Feedback

Final Integration

Schematics

Circuit diagram

Code

Aruduino IDE code

Server code

Credits

Bharath Ram

Comments

Related channels and tags