The project demonstrates how to build a Haptic-Navigation-based device with a visual information detection setup useful for visually impaired people.
The device is designed to be worn like a belt on the waist and a cap that helps in navigation through binocular vision and depth analysis 'a precision guide' with vibration-based feedback, an audio-based text reading helper, products and currency detection.
People with moderate or severe vision loss can use this device to travel safely and navigate through obstacles, safe from accidental collisions along with books or newspapers reading and shopping independently.
Build2gether ChallengeThe project takes inspiration from Build2gether 2.0 — Inclusive Innovation Challenge, a challenging contest for the transformative role of assistive technologies for people with disabilities to explore beyond the conventional methods and create innovative and useful solutions and novel for people with disabilities.
The contest focused on two primary areas of innovation: support for visually impaired and mobility-impaired individuals. The competition was divided into two main tracks, each dedicated to developing adaptations for outdoor or indoor activities tailored to the needs of these individuals. The judging panel, composed of experts who share the same background as the target groups, ensured that the evaluation process was thorough and empathetic to the real-world challenges these communities face.
The contest provided participants with a unique opportunity to engage in an interactive and collaborative environment, fostering the creation of novel solutions that can significantly enhance the quality of life for those with visual or mobility impairments.
Problem IdentificationAs mentioned the competition focused on addressing the challenges faced by visually impaired and mobility-impaired individuals, with a particular emphasis on two key areas: adaptations for outdoor and indoor activities. Given the global context, where approximately 2.2 billion people live with some form of vision impairment, according to the World Health Organization (WHO), the importance of such innovations cannot be overstated. Among these individuals, 36 million are blind, and 217 million have moderate or severe vision impairment, highlighting the pressing need for technological solutions that can enhance their daily lives.
1] SAFE TRAVEL AND NAVIGATION
My focus was on the track dedicated to majorly outdoor activities with some parts of indoor activities like reading, where I identified specific problems through surveys to develop a solution that would significantly improve the mobility and independence of visually impaired individuals. The first issue I addressed was the challenge of safe travel in unfamiliar-unknown environments. Despite the use of mobility canes, visually impaired individuals often face difficulties in detecting obstacles beyond the cane's reach, leading to potential collisions, difficulty navigating crowded areas, and sudden changes in terrain. These obstacles can make independent travel a daunting and sometimes dangerous experience.
2] ACCESS TO VISUAL INFORMATION
Another major problem I focused on is the inability of visually impaired individuals to access visual information, such as reading currency notes or product details while shopping. This lack of access not only limits their independence but also creates a psychological burden of trust and insecurity when dealing with financial transactions and daily needs. For example, my relative, who lacks light perception, struggles with tasks as simple as identifying currency or understanding product labels, which are not universally available in braille. This issue highlights a broader societal challenge in making everyday tasks more accessible for those with vision impairments.
By focusing on these specific problems, I aimed to contribute to a more inclusive society where visually impaired individuals can navigate their environments and manage their daily tasks related to visual information with greater ease and confidence.Developing An Elegant Solution
The solution presented by me is an innovative smart belt paired with a cap designed specifically for visually impaired individuals to enhance their ability to navigate various environments safely and independently along with the ability to detect currency notes and access text from any book or newspapers and product details. This wearable assistive technology integrates components such as a high-resolution camera along with xiao esp32 sense, ultrasonic sensor, haptic feedback(vibrational) modules, gyroscopic sensors, a small speaker, and a microcontroller STM32F4 Adafruit Blackpil and single-board computers like Unhiker and Raspberry Pi 5, all orchestrated to provide real-time guidance to visually impaired individuals.
The device is a real-time innovative assistive technology named Navigator and Visual Information Scanner [Nav-VIS]
It is divided into three subsystems as follows-
1] The Guide - The navigation component of this device, containing vibrational feedback and Raspberry Pi 5* (8 GB) with two RGB-based cameras for binocular metric depth pre-trained models for depth map and Yolov9-based custom trained model for object detection along with feedback from ultrasonic and accelerometer sensors via stm32blackpill to create a proper architecture for A* algorithm to navigate in real-time through crowded environments.
2] The Protector - The main purpose of this subsystem is to protect the user from sudden unwanted collision and it's designed in such a way for high emergency alerts, it vibrates strongly when the object is proceeding too fast towards the user. Also, it notifies the person about obstacles near them through vibration feedback mainly consisting of Stm32F4, vibrational feedback and ultrasonic sensor.
3] The Helper - A type of assistive subsystem designed solely for text reading, products and currency detection uses Seed Studio's Xiao camsenseS3 for currency classification, a small camera(webcam) for text and product detection and Unihiker* as its core computer( a Linux based single computer) using libraries like Tesseract, custom trained currency classification using Edge Impulse Studio.
*Using a single board computer like Nvidia Jetson would provide more high precision and accuracy.
**For depth estimation, the use of high-precision LiDARs or RealSense cameras was deliberately avoided due to two primary factors. Firstly, the financial and computational demands of these sensors are significant, which may not be feasible for a cost-effective and portable solution. Secondly, LiDAR technology involves the continuous emission of laser radiation, which raises concerns regarding safety and energy consumption in prolonged use and the use of such technology isn't budget-friendly.The Hardware design
Let's understand how the device works and how its hardware is structured into three subsystems to handle the various tasks mentioned, here is the flowchart overview and we will go to each subsystem individually.
The product hardware set-up and design of all three subsystems are explained below in detail
THE GUIDEThe Guide consists of a novel methodology to navigate through various obstacles using object detection and metric depth estimation. The depth analysis is done through a highly pre-trained deep-learning model that's DepthAnything model, it contains both indoor and outdoor pre-trained models for robust and accurate depth estimation. The Guide subsystem has two RGB cameras structured in such a way the input is taken as a binocular vision of approximately 150 degrees field of view using a standard Open Cv program for video capture.
The frame from the video is now processed continuously and the custom-trained object detection model COCO dataset is deployed to find specific objects or obstacles that are in the field of view, and then the depth anything metric depth analysis gives a depth map of that frame which scaled to a grid of relative depths and then A* algorithm is applied to farthest position of destiny. The A* algorithm gives the shortest path to avoid obstacles The haptic feedback from the Gpio pins of Raspberry Pi is given to the vibrating motor coin for moving forward, backwards, left and right. To maintain high precision, the code is organized within a virtual environment and runs on a server using Python.
Hardware Build Instructions and Code [ Guide ]The connections for the Guide subsystem are the simplest starting with its core computer you need to flash Raspberry Pi 5 using an SD card or USB drive, into Raspbian software namely Raspberry Pi OS (Legacy, 64-bit) Full, then after connecting Raspberry Pi to monitor display, mouse and keyboard start with downloading and upgrading following libraries of python (version 3.9)
Gradio_imageslider
Gradio==4.29.0
Matplotlib
Opencv-python
Torch
Torchvision
If you face any problem regarding Torch since Raspberry Pi is an arm architecture it will show illegal instructions, the article on this website explains how to download it for RPI here.
For connection refer to the following image:
TRAINED MODEL -METRIC DEPTH ESTIMATION
The code for the following Guide navigation setup is uploaded on this GitHub repository, where it contains three different files and a custom-trained model for object detection. You can refer to GitHub DepthAnything for the depth-anything model which contains three models, this project uses the Base model. It uses a data engine to automate the annotation process for the vast corpus of unlabeled images it harnesses.
The code contains various arguments for the webcam video path, the name and path of the depth-anything model you are using and the encoder type
Here are some previews of this model credits-DepthAnything
The DepthAnything model is a highly trained model with great accuracy for depth estimation, The model gives a metric depth map but we aren't interested in the depth values of the complete image but the objects that we want that are acting as obstacles in the surroundings.
Here is step by step thought process and working of code-
- First start with booting up the RPI and then create a folder for this subsystem. Starting with importing libraries, we are using Vs-code for editing purposes (any compatible IDE can be used with RPI )
import argparse
import cv2
import numpy as np
import os
import torch
import time
import matplotlib
import check
from depth_anything_v2.dpt import DepthAnythingV2
# if you wonderring what is check, its just a second python file for sending
coordinates of detected objects to A* algorithm
For importing the Depth Anything model you can either git clone the depth-anything V2 or directly place the downloaded file inside the folder which contains this Python file.
- Download the custom-trained object detection model from GitHub and again place it in this folder. Then we just need to give the path for the location of the dataset files.
classNames = []
classFile = "Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/coco.names"
with open(classFile, "rt") as f:
classNames = f.read().rstrip("\n").split("\n")
configPath = "Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
weightsPath = "/Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/frozen_inference_graph.pb"
- Now, deploying the models and defining an object detection function, will be called upon later-
net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
def getObjects(img, thres, nms, draw=True, objects=[]):
classIds, confs, bbox = net.detect(img, confThreshold=thres, nmsThreshold=nms)
objectInfo = []
if len(classIds) != 0:
for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
className = classNames[classId - 1]
if className in objects:
objectInfo.append([box, className])
if draw:
cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
cv2.putText(img, classNames[classId-1].upper(), (box[0]-0, box[1]-30), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
cv2.putText(img, str(round(confidence*100, 2)), (box[0]-200, box[1]-30), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
return img, objectInfo
- Performing the processing of frame and locating the object using bounding boxes,on objects that are custom that we want. Let's do it by creating a function that is called when openCV captures an image from video, obviously, the frame rate depends on the processing of the image. The function first creates a bounding box on the custom object then it detects the depth of the object and then sends the data to the other file which will be defined later.
The depth estimation assumes that the coordinates of the object is at the centre of bounding object detected and are considered to be in 2d plane, it can be further converted into 3d reconstruction using Depth Map.
def process_frame(frame, depth_anything, args, frame_count):
raw_image = frame
coordinates =[]
# Performing the object detection
result, objectInfo = getObjects(raw_image, 0.45, 0.2, objects=['person', 'chair', 'table', 'dinning table','car'])
# Print metric depth only where objects are detected that we want
if len(objectInfo) > 0:
print(f'Frame {frame_count}: Metric depth (in meters) at object locations:')
for box, className in objectInfo:
x, y, w, h = box
center_x = x + w // 2
center_y = y + h // 2
# Ensure center coordinates are within image bounds we thought of
if center_y < 0 or center_y >= raw_image.shape[0] or center_x < 0 or center_x >= raw_image.shape[1]:
continue
depth = depth_anything.infer_image(raw_image, args.input_size)
depth_meter = depth[center_y, center_x] * args.max_depth / 255.0
#time.sleep(2), Uncomment only when we need a crosschecking
center_x= int(center_x*0.075)-17
depth_meter=int(depth_meter)+1
print(depth_meter,center_x)
coordinates.append([center_x,depth_meter])
print(f'Object: {className}, Depth: {depth_meter} meters', center_x )
check.Find_mypath(coordinates)
# check is just a function that sends the coordinates to A* algorithm
print()
# Perform depth estimation on the entire frame
depth = depth_anything.infer_image(raw_image, args.input_size)
# Normalizing and converting depth to visual representation for depth map
depth_visual = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
depth_visual = depth_visual.astype(np.uint8)
depth_meters = depth * args.max_depth / 255.0
if args.grayscale:
depth_visual = cv2.cvtColor(depth_visual, cv2.COLOR_GRAY2BGR)
else:
cmap = matplotlib.cm.get_cmap('Spectral')
depth_visual = (cmap(depth_visual)[:, :, :3] * 255).astype(np.uint8)
# Combining original frame with depth visualization
split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
combined_result = np.hstack([raw_image, split_region, depth_visual])
return combined_result
- Since all the functions are created let's deploy the depth metric estimation model,the script is designed to estimate depth from a webcam feed in real time using the Depth Anything V2 model. It uses argparse for command-line arguments, torch for model handling and for video processing and display. The Depth Anything V2 model is initialized with the selected configuration.The model weights are loaded from the specified checkpoint file. The model is moved to the specified device (GPU or CPU) and set to evaluation mode here its CPU as its arm architecture.
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Depth Anything V2 Metric Depth Estimation on Webcam')
parser.add_argument('--video-path', type=int, default=4, help='Webcam index for video capture (default: 0)')
parser.add_argument('--input-size', type=int, default=1, help='Input size for image processing')
parser.add_argument('--outdir', type=str, default='./vis_depth', help='Output directory')
parser.add_argument('--encoder', type=str, default='vits', choices=['vits', 'vitb', 'vitl', 'vitg'], help='Encoder type')
parser.add_argument('--load-from', type=str, default='/home/dsay/Documents/hackster/depthanything/Depth-Anything-V2/metric_depth/checkpoints/depth_anything_v2_metric_hypersim_vits.pth', help='Path to model checkpoint')
parser.add_argument('--max-depth', type=float, default=50, help='Maximum depth value')
parser.add_argument('--save-numpy', dest='save_numpy', action='store_true', help='Save the model raw output')
parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='Only display the depth prediction')
parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='Do not apply colorful palette')
args = parser.parse_args()
# Initialising the device
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
#Initialize model configuration based on chosen encoder
model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
}
# Initialize DepthAnythingV2 model
depth_anything = DepthAnythingV2(**{**model_configs[args.encoder], 'max_depth': args.max_depth})
depth_anything.load_state_dict(torch.load(args.load_from, map_location='cpu'))
depth_anything = depth_anything.to(DEVICE).eval()
Here is the pictures (screenshots) of how the depth map is generated-
- Now, We do normal video capture under the main script,This code captures video frames from a webcam or video file, processes each frame using a depth estimation model, and displays the results in real time. The loop continues until the user presses the 'q' key or the video ends. After the loop, the resources are released, and any open windows are closed.But in Raspberry Pi a construct is created with apush button if it's pressed the video depth estimation will stop.
cap = cv2.VideoCapture(args.video_path)
frame_count = 0
# Create output directory if not exists
os.makedirs(args.outdir, exist_ok=True)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame_count += 1
combined_result = process_frame(frame, depth_anything, args, frame_count)
cv2.imshow('Depth Estimation', combined_result)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release webcam capture and close all windows
cap.release()
cv2.destroyAllWindows()
Navigation algorithmThe Python script ( as mentioned above I named it check.py ) implements the A* algorithm, a popular pathfinding and graph traversal technique used to find the shortest path between two points on a grid. In this case, the grid is a 15x5 matrix where each cell can either be walkable (unblocked) or non-walkable (blocked). The primary goal of the script is to navigate from a starting point (referred to as the source) to a destination on this grid, avoiding obstacles and finding the most efficient route.
The grid is initially set up with all cells marked as unblocked, meaning they are walkable. However, the user can specify certain cells to be blocked, making them impassable. This flexibility allows the simulation of different environments where obstacles might be present. The grid is represented as a two-dimensional list, where each element is either a 1
(indicating the cell is unblocked) or a 0
(indicating the cell is blocked). The A* algorithm then operates on this grid to find the optimal path from the source to the destination.
Below is the code for standard A* Algorithm, you can watch this video by MATLAB for better explanation of concepts behind how it works -
import math
import heapq
### Define the Cell class for better distinguistion ###
class Cell:
def __init__(self):
self.parent_i = 0 # Parent cell's row index
self.parent_j = 0 # Parent cell's column index
self.f = float('inf') # Total cost of the cell (g + h)
self.g = float('inf') # Cost from start to this cell
self.h = 0 # Heuristic cost from this cell to destination
# Define the size of the grid
ROW = 15
COL = 5
# Check if a cell is valid (within the grid)
def is_valid(row, col):
return (row >= 0) and (row < ROW) and (col >= 0) and (col < COL)
# Check if a cell is unblocked
def is_unblocked(grid, row, col):
return grid[row][col] == 1
# Check if a cell is the destination
def is_destination(row, col, dest):
return row == dest[0] and col == dest[1]
# Calculate the heuristic value of a cell (Euclidean distance to destination)
def calculate_h_value(row, col, dest):
return ((row - dest[0]) ** 2 + (col - dest[1]) ** 2) ** 0.5
# Trace the path from source to destination
def trace_path(cell_details, dest):
print("The Path is ")
path = []
row = dest[0]
col = dest[1]
# Trace the path from destination to source using parent cells
while not (cell_details[row][col].parent_i == row and cell_details[row][col].parent_j == col):
path.append((row, col))
temp_row = cell_details[row][col].parent_i
temp_col = cell_details[row][col].parent_j
row = temp_row
col = temp_col
# Add the source cell to the path
path.append((row, col))
# Reverse the path to get the path from source to destination
path.reverse()
# Print the path
for i in path:
print("->", i, end=" ")
print()
# Implement the A* search algorithm
def a_star_search(grid, src, dest):
# Check if the source and destination are valid
if not is_valid(src[0], src[1]) or not is_valid(dest[0], dest[1]):
print("Source or destination is invalid")
return
# Check if the source and destination are unblocked
if not is_unblocked(grid, src[0], src[1]) or not is_unblocked(grid, dest[0], dest[1]):
print("Source or the destination is blocked")
return
# Check if we are already at the destination
if is_destination(src[0], src[1], dest):
print("We are already at the destination")
return
# Initialize the closed list (visited cells)
closed_list = [[False for _ in range(COL)] for _ in range(ROW)]
# Initialize the details of each cell
cell_details = [[Cell() for _ in range(COL)] for _ in range(ROW)]
# Initialize the start cell details
i = src[0]
j = src[1]
cell_details[i][j].f = 0
cell_details[i][j].g = 0
cell_details[i][j].h = 0
cell_details[i][j].parent_i = i
cell_details[i][j].parent_j = j
# Initialize the open list (cells to be visited) with the start cell
open_list = []
heapq.heappush(open_list, (0.0, i, j))
# Initialize the flag for whether destination is found
found_dest = False
# Main loop of A* search algorithm
while len(open_list) > 0:
# Pop the cell with the smallest f value from the open list
p = heapq.heappop(open_list)
# Mark the cell as visited
i = p[1]
j = p[2]
closed_list[i][j] = True
# For each direction, check the successors
directions = [(0, 1), (0, -1), (1, 0), (-1, 0), (1, 1), (1, -1), (-1, 1), (-1, -1)]
for dir in directions:
new_i = i + dir[0]
new_j = j + dir[1]
# If the successor is valid, unblocked, and not visited
if is_valid(new_i, new_j) and is_unblocked(grid, new_i, new_j) and not closed_list[new_i][new_j]:
# If the successor is the destination
if is_destination(new_i, new_j, dest):
# Set the parent of the destination cell
cell_details[new_i][new_j].parent_i = i
cell_details[new_i][new_j].parent_j = j
print("The destination cell is found")
# Trace and print the path from source to destination
trace_path(cell_details, dest)
found_dest = True
return
else:
# Calculate the new f, g, and h values
g_new = cell_details[i][j].g + 1.0
h_new = calculate_h_value(new_i, new_j, dest)
f_new = g_new + h_new
# If the cell is not in the open list or the new f value is smaller
if cell_details[new_i][new_j].f == float('inf') or cell_details[new_i][new_j].f > f_new:
# Add the cell to the open list
heapq.heappush(open_list, (f_new, new_i, new_j))
# Update the cell details
cell_details[new_i][new_j].f = f_new
cell_details[new_i][new_j].g = g_new
cell_details[new_i][new_j].h = h_new
cell_details[new_i][new_j].parent_i = i
cell_details[new_i][new_j].parent_j = j
# ///////////////if not found////////////////////////If the destination is not found after visiting all cells
if not found_dest:
print("Failed to find the destination cell")
def main():
# Define the size of the grid
ROW = 15
COL = 5
# Initialize the grid with all elements as 1 (unblocked)
grid = [[1] * COL for _ in range(ROW)]
# Input number of blocked cells
#num_blocked = int(input("Enter number of cells to block: "))
# Input indices of cells to be blocked
for _ in range(2):
row = int(input(f"Enter row index for blocked cell{ }: "))
col = int(input(f"Enter column index for blocked cell{ }: "))
# Check if the indices are within bounds
if 0 <= row < ROW and 0 <= col < COL:
grid[row][col] = 0
else:
print(f"Invalid indices ({row}, {col}). Ignoring this cell.")
# Print the grid to show the initial setup
#print("Initial Grid:")
#for row in grid:
# print(row)
# Input source and destination
#src_row = int(input("Enter source row: "))
#src_col = int(input("Enter source column: "))
#dest_row = int(input("Enter destination row: "))
#dest_col = int(input("Enter destination column: "))
src = [7,0]
dest = [7,4]
# A* search algorithm
a_star_search(grid, src, dest)
if __name__ == "__main__":
main()
In the script, the main()
function sets up the grid and allows the user to block certain cells by specifying their row and column indices. After setting up the grid, the A* search algorithm is executed, starting from a predefined source cell at coordinates (7,0) and attempting to reach a destination at coordinates (7,4). If a path is found, the algorithm prints the sequence of grid cells that make up the path; if not, it informs the user that no valid path exists.
The Video for depth estimation and the A* algorithm involves running a video that simulates the real-world scenario being analyzed. Since it's not possible to directly show what the camera is observing in the actual environment, the simulation is visualized on a monitor through screen recording. This method allows us to capture and display the depth estimation process and the pathfinding algorithm in action, making it easier to analyze and understand how the system would operate in a real-world setting. Also, the use of the device is already shown in the video above.
Since we assumed a 5*15 grid for a 2d plane the values of x coordinated (that's the x coordinate of the image frame scaled to the real world) and z values which are depth measured, we assume the user to be at centre bottom of the grid and A* algorithm chooses the farthest point in a grid through a vector The video for the code simulation is -
The Helper is an assistive subsystem specifically designed to support visually impaired individuals by facilitating text reading, product identification, and currency detection. At its core, the system utilizes Seed Studio's Xiao Camsense S3, a highly sensitive sensor designed for accurate currency classification. This sensor is custom-trained in Edge Impulse tinyML to distinguish between different denominations of Indian Currency with precision, ensuring reliable and quick identification during transactions.
In addition to currency detection, the subsystem incorporates a small, high-resolution camera (webcam) dedicated to recognizing and reading text from various sources, such as product labels, books, and other printed materials. It uses the Tesseract library of Python, an industry-standard Optical Character Recognition (OCR) engine capable of converting printed text from images into text with high accuracy. The camera captures the visual data, which is processed by the Unihiker—a compact, Linux-based single-board computer known for its robust performance and versatility. The text or currency detected is then processed through GTTS library-based text-to-audio converter and an audio amplifier and speaker are used as Unihiker supports audio output.
Hardware Build Instructions and Code [ HELPER]The following is the circuit and diagram of the Helper-
Circuit assembled originally-
Circuit diagram for Audio Output
Soldered into the board -
The zoomed image of the camera and currency detector using cam-sense Esp32S3
The USB input cables of the scanner is connected to the USB hub and then the USB hub to the USB-typeA port of Unihiker.
Unihiker SetupLet's start with Unihiker, for all the import libraries and updates required, plug in Unihiker and connect it to a network one can follow the Unihiker startup guide
- Follow-up commands for installing libraries in Unihiker's terminal Check whether the Python version is 3.9 else update it.
sudo apt-get update
sudo apt-get upgrade
pip install opencv-python
pip install numpy
pip install tesseract
pip install gtts
pip install playsound
pip install pyserial
** We can use pytts for offline mode
- We will use buttons A and B person for selecting text reading or Money detection, respectively so we require a ping-pong library which can be downloaded using instructions given in the guide above.
For text detection, we will create two Python files one for image capture from live video via camera and the other for converting it to audio. To make it less complex let's first understand the second Python file which receives an image.
- In the Unihiker's terminal command nano "name_of_python_file" in the .py extension for editing a python code (here the name of this file is ocr4.py ), start with importing the following libraries for the ocr4.py :
import pytesseract
import time
import gtts
import playsound
- Define a function, which is called when the image is captured so, that pytesseract can process text from the image into a string and then pass it as an argument to gtts for converting into sound. The use of the EasyOCR model for conversion of image to text is found to be much faster and more accurate but Unihiker doesn't support the standard EasyOCR model
def hello():
time.sleep(1) # timelag for processing image
imgchar = pytesseract.image_to_string('captured_image.png')
sound = gtts.gTTS(imgchar,lang='en')
sound.save("hello.mp3")
playsound.playsound("hello.mp3")
- Finally, create a Python file that imports the above file when the image is captured using Open cv and save an image with described path.
import cv2
import time
import ocr4
#import name is ocr4 as it was name of python file
# Open the video file or camera stream
cap = cv2.VideoCapture(0)
# Replace with your video path index default value is 0
if not cap.isOpened():
print("Error: Could not open video.")
while cap.isOpened():
ret, frame = cap.read()
if not ret:
print("End of video.")
break
# cv2.imshow('Frame', frame) Uncomment this if want to see captured image
# Delay for 25 milliseconds (you can adjust this value)
time.sleep(0.025) # 25 milliseconds delay
img_path = 'captured_image.png'
cv2.imwrite(img_path, frame)
print(f"Image saved as {img_path}")
# Press 'q' to exit
if 0xFF == ord('q'):
break
ocr4.hello()
# Calling the file described above
cap.release()
cv2.destroyAllWindows()
This was part of text detection, the complete code that's used for text detection customised for Unihiker will be explained. For the case of object detection, one can use the same file for object detection used in navigation.
Here is a small video of Text reading-
Currency ClassificationThis part of subsystem Helper is executed using Esp32S3 cam sense by Seeed Studio XIAO, as there is already a proper step-by-step explanation of How one can use this module for Image Classification.
Following the methods given in the article above I trained using 91 images of various currency denominations used in India, which include ten rupees coin, and Fifty, Hundred, Two Hundred and Five Hundred Rupees.
Using Arduino IDE to deploy the classification model you need to use the following method-
- Connect the module with Arduino IDE, select the
File>Preferrences>On
Additional
board URls
paste the following .json link: https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_dev_index.json https://files.seeedstudio.com/arduino/package_seeeduino_boards_index.json - You need to download the Seeduino Xiao board manager in the board manager option
- When the installation starts, you will see an output pop-up window. After the installation is complete, an “INSTALLED” option will appear
Now after following the methods given in Image Classification, deploying the model, from example select sketches, then sketches>
examples esp32> esp32_camera
and change the the pin lines from 32 -75 as given below:
#define PWDN_GPIO_NUM -1
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 10
#define SIOD_GPIO_NUM 40
#define SIOC_GPIO_NUM 39
#define Y9_GPIO_NUM 48
#define Y8_GPIO_NUM 11
#define Y7_GPIO_NUM 12
#define Y6_GPIO_NUM 14
#define Y5_GPIO_NUM 16
#define Y4_GPIO_NUM 18
#define Y3_GPIO_NUM 17
#define Y2_GPIO_NUM 15
#define VSYNC_GPIO_NUM 38
#define HREF_GPIO_NUM 47
#define PCLK_GPIO_NUM 13
We will get the image detection name and string values of the detected currency in the Serial Monitor of Arduino IDE. Uncomment all the unnecessary details and keep the string that serial prints the name of the currency detected. We will be using a USB connection with Unihiker and receive the serial communication to set the baud rate to 115200 in Arduino IDE.
As we have already installed Pyserial in Unihiker, check whether we are receiving the string values of detected money after connecting the camsense to Unihiker through USB-HUB through the following code.
if __name__ == '__main__':
ser = serial.Serial('/dev/ttyACM0', 115200, timeout=1)
ser.flush()
while True:
if ser.in_waiting > 0:
line = ser.readline().decode('utf-8').rstrip()
As we can see below the currency is being detected and shown in terminal through serial communication, the USB serial communication part of Unihiker helps truly to integrate it with Camsense esp32s3
REMARK- The values of the port '/dev/ttyACM0' change its index to 'ttyACM1' or any other values after the reboot of Unihiker, so you need to check it through trial and error.
Here is a video that shows currency detection-
Complete Setup of HelperLet's set all the individual codes into one along with the use of physical buttons present in Unihiker. The code is as follows:
import cv2
import time
import ocr4
import serial
import gtts
import playsound
from pinpong.board import *
from pinpong.extension.unihiker import *
from unihiker import Audio
audio = Audio()
Board().begin()
while True:
if button_a.is_pressed() == True:
# Open the video file or camera stream if button A is pressed
cap = cv2.VideoCapture(0) # Replace with your video file path
if not cap.isOpened():
print("Error: Could not open video.")
while cap.isOpened():
ret, frame = cap.read()
if not ret:
print("End of video.")
break
# Display the frame (or do other processing),don,t need this in device justy for debugging
cv2.imshow('Frame', frame)
# Delay for 25 milliseconds (you can adjust this value)
time.sleep(0.025) # 25 milliseconds delay for proper read
img_path = 'captured_image1.png'
cv2.imwrite(img_path, frame)
print(f"Image saved as {img_path}")
# Press 'button A' to exit
if cv2.waitKey(1) & button_a.is_pressed() == True:
break
ocr4.hello()
# Release the video capture object and close all windows if reading is done
cap.release()
cv2.destroyAllWindows()
if button_b.is_pressed() == True:
time.sleep(1)
if __name__ == '__main__':
ser = serial.Serial('/dev/ttyACM1', 115200, timeout=1)
ser.flush()
while True:
if ser.in_waiting > 0:
line = ser.readline().decode('utf-8').rstrip()
if '10 rupee' in line:
print(True) # Print True if '10 rupee' is detected
time.sleep(5)
line1 =' TEN RUPEE '
sound = gtts.gTTS(line1,lang='en')
sound.save("money1.mp3")
audio.play("money1.mp3")
if '500 rupee' in line:
print(True) # Print True if '10 rupee' is detected
time.sleep(5)
line1 =' FIVE HUNDRED RUPEE '
sound = gtts.gTTS(line1,lang='en')
sound.save("money2.mp3")
audio.play("money2.mp3")
if '200 rupee' in line:
print(True) # Print True if '10 rupee' is detected
time.sleep(5)
line1 ='TWO HUNDRED RUPEE '
sound = gtts.gTTS(line1,lang='en')
sound.save("money3.mp3")
audio.play("money3.mp3")
if '100 rupee' in line:
print(True) # Print True if '10 rupee' is detected
time.sleep(5)
line1 ='HUNDRED RUPEE '
sound = gtts.gTTS(line1,lang='en')
sound.save("money4.mp3")
audio.play("money4.mp3")
if '50 rupee' in line:
print(True) # Print True if '10 rupee' is detected
time.sleep(5)
line1 ='Fifty RUPEE '
sound = gtts.gTTS(line1,lang='en')
sound.save("money5.mp3")
audio.play("money5.mp3")
The complete video of working of the Helper Subsystem is shown in the video below ,its just the above videos combined -
The Helper, as its name suggests, is one of the most important subsystems of this project. It creates a sense of independence for visually impaired persons. Using the complete prototype, the Helper hopes to achieve more efficient and faster scanning of visual information.
THE PROTECTORThe Protector comprises STM32F4 Blackpil, an Ultrasonic Sensor and a strong vibration feedback motor. The work of the protector is simple to notify objects near the range of 50cm to 400cm (4 meters) and protect the user wearing this from accidental collision. To calculate the rate of objects moving fast towards the user, a method of difference of old and new positions is used, which is kept on ultra alert as soon as any object approaches fastly towards the user it triggers the motor to vibrate either to move left, right, back or front.
Hardware Build Instructions and Code [ PROTECTOR ]First start with the following circuit layout for reference given below-
The pin diagram of Blackpil is as shown-
The image of the implemented circuit in the belt-
- Connect the Blackpil with Arduino IDE, select the
File>Preferrences>
On Additional
board URls
paste the following .json link: https://raw.githubusercontent.com/stm32duino/BoardManagerFiles/main/package_stmicroelectronics_index.json
- Download the STM Cuber from the following link, STMCube Integrated Development Environment for STM32, and place it inside the Arduino folder.
- Now select the board names in Arduino Ide from
Tools>STMbasedmcus>Generic
STM32F4
series
- First starting with the conditions for notification through vibration feedback, Here the lower range is 50 cm, but 30 cm is more effective for this subsystem.
- Start with defining variables and integers for the pin value of the Blackpil setup:
//left Top
int time_durationA=0;
int distance_oldA,distance_newA;
int delA,distanceCmA;
int left_top = PA0;
int trig_A = PB13;
int echo_A = PB14;
//right Top
int time_durationB=0;
int distance_oldB,distance_newB;
int delB,distanceCmB;
int right_top = PA1;
int trig_B = PB15;
int echo_B = PA8;
//left back
int time_durationC=0;
int distance_oldC,distance_newC;
int delC,distanceCmC;
int left_back = PA2;
int trig_C = PA9;
int echo_C = PA10;
//right back
int time_durationD=0;
int distance_oldD,distance_newD;
int delD,distanceCmD;
int right_back = PA3;
int trig_D = PA11;
int echo_D = PA15;
Three pins variables are required for each ultrasonic sensor and a vibration feedback coin for the left front, left back, right front, right back and front.
- The void setup of the Arduino code has the following input output
void setup(){
pinMode(left_top, OUTPUT);//left Top
pinMode(echo_A, INPUT);//left Top
pinMode(trig_A, OUTPUT);//left Top
pinMode(right_top, OUTPUT);//right Top
pinMode(echo_B, INPUT);//right Top
pinMode(trig_B, OUTPUT);//right Top
pinMode(left_back,OUTPUT);//left back
pinMode(echo_C, INPUT);//left back
pinMode(trig_C, OUTPUT);//left back
pinMode(right_back, OUTPUT);//right back
pinMode(echo_D, INPUT);//right back
pinMode(trig_D, OUTPUT);//right back
Serial.begin(9600);
}
- The code for the difference in position calculation and distance calculation using trigger and echo of the ultrasonic sensor. You can customise the following different values concerning metric depth and ultrasonic sensor capability. One can also use the Tof sensor for better accuracy. The following code comes under
voidloop()
distance_oldA = distance_newA;//speed factor for distance approach so fast
distance_oldB = distance_newB;//speed factor for distance approach so fast
distance_oldC = distance_newC;//speed factor for distance approach so fast
distance_oldD = distance_newD;//speed factor for distance approach so fast
digitalWrite(trig_A,HIGH);
digitalWrite(trig_B,HIGH);
digitalWrite(trig_C,HIGH);
digitalWrite(trig_D,HIGH);
delay(2000);
digitalWrite(trig_A,LOW);
digitalWrite(trig_B,LOW);
digitalWrite(trig_C,LOW);
digitalWrite(trig_D,LOW);
time_durationA = pulseIn(echo_A,HIGH);
time_durationB = pulseIn(echo_B,HIGH);
time_durationC = pulseIn(echo_C,HIGH);
time_durationD = pulseIn(echo_D,HIGH);
distanceCmA = (time_durationA * 0.034) / 2;
distance_newA = distanceCmA;
distanceCmB = (time_durationB * 0.034) / 2;
distance_newB = distanceCmB;
distanceCmC = (time_durationC * 0.034) / 2;
distance_newC = distanceCmC;
distanceCmD = (time_durationD * 0.034) / 2;
distance_newD = distanceCmD;
delA = abs(distance_newA-distance_oldA);
delB = abs(distance_newA-distance_oldA);
delC = abs(distance_newA-distance_oldA);
delD = abs(distance_newA-distance_oldA);
- The code for the basic systematic loop for protection and notification for one sensor is given by :
// For Sensor A at left top position
if(delA > 10 && 50 < distanceCmA && distanceCmA < 250) {
analogWrite(left_top, 255); // ultra alert !!
delay(6000);
}
if(250 < distanceCmA && distanceCmA < 350) {
analogWrite(left_top, 60);
delay(2000);
}
if(150 < distanceCmA && distanceCmA < 250) {
analogWrite(left_top, 120);
delay(2000);
}
if(100 < distanceCmA && distanceCmA < 150) {
analogWrite(left_top, 180);
delay(2000);
}
if(30 < distanceCmA && distanceCmA < 100) {
analogWrite(left_top, 255);
delay(4000);
}
Similarly, the code can be written for all other ultrasonic sensors and other positions or the method of interrupt can be used. The distance is checked on each iteration of the voidloop()
function, ensuring that the measurements are continuously processed. You can find more efficient code in GitHub link.
The Working of the Protector for one ultrasonic can be seen here.Similarly, it can be done for all other sensors as explained in the code.
The primary objective of the Protector device was to prevent accidental collisions for visually impaired individuals by detecting and responding to nearby objects that may pose a threat.
This vibration alerts the user and indicates the direction in which the obstacle is present in their surroundings (left, right, back, or front) to avoid the obstacle assuming the device considers the user as a system and everything under a 30cm radius is part of system, providing a simple yet effective solution to enhance user safety.
THE ASSEMBLYThe components of all three subsystems are kept inside a belt designed to be comfortable, and the pair of cameras are attached to a cap along with a speaker [It would be less comfortable to use but if budget allows one can use a wireless camera and speaker].
- The future vision for Nav-VIS involves several key advancements to enhance its functionality and user experience. One of the primary goals is to integrate voice control and description of surroundings, allowing users to interact with the device more naturally and intuitively like a personal assistant, making it easier for visually impaired individuals to operate the system and ask their assistant NAV-VIS about the surrounding like names of people, description of things in front of them or any scene.
- Additionally, custom PCB designs will be developed to consolidate and optimize all hardware components, reducing the overall size and weight of the device, and thereby increasing comfort and wearability. Also introduces advanced RealSense cameras and NVIDIA Jetson-level single-board computers, enabling it to detect improved depth estimation with greater accuracy. This enhancement will allow users to better navigate complex environments and recognize people around them.
- Moreover, by incorporating libraries like EasyOCR, the device will offer more precise and efficient text recognition, further aiding in the identification of products, reading currency notes, and accessing information from books or newspapers, a complete description of edible products and nutrient content. These upgrades will collectively make Nav-VIS a more powerful and versatile tool for visually impaired individuals, significantly improving their independence and quality of life.
Comments