Created August 30, 2024

Haptic-Navigator and Visual Information Scanner

A device capable of helping in navigation using binocular vision and Visual information reading through audio

AdvancedFull instructions providedOver 1 day140

Honorable Mention Visual Impairments

Build2gether 2.0 — Inclusive Innovation Challenge

Haptic-Navigator and Visual Information Scanner

Things used in this project

Hardware components

DFRobot UNIHIKER - IoT Python Programming Single Board Computer with Touchscreen

Raspberry Pi 5

Seeed Studio XIAO ESP32S3 Sense

Adafruit STM32F411 "BlackPill" Development Board

Seeed Studio Grove - Ultrasonic Ranger

Webcam, Logitech® HD Pro

Speaker: 0.25W, 8 ohms

Brown Dog Gadgets Solar Cockroach Vibrating Disc Motor

Software apps and online services

Microsoft VS Code

Arduino IDE

Raspberry Pi Raspbian

STMicroelectronics STM32CUBEPROG

Story

Overview

The project demonstrates how to build a Haptic-Navigation-based device with a visual information detection setup useful for visually impaired people.

The device is designed to be worn like a belt on the waist and a cap that helps in navigation through binocular vision and depth analysis 'a precision guide' with vibration-based feedback, an audio-based text reading helper, products and currency detection.

People with moderate or severe vision loss can use this device to travel safely and navigate through obstacles, safe from accidental collisions along with books or newspapers reading and shopping independently.

Build2gether Challenge

Build2gether 2.0 — Inclusive Innovation Challenge

The project takes inspiration from Build2gether 2.0 — Inclusive Innovation Challenge, a challenging contest for the transformative role of assistive technologies for people with disabilities to explore beyond the conventional methods and create innovative and useful solutions and novel for people with disabilities.

The contest focused on two primary areas of innovation: support for visually impaired and mobility-impaired individuals. The competition was divided into two main tracks, each dedicated to developing adaptations for outdoor or indoor activities tailored to the needs of these individuals. The judging panel, composed of experts who share the same background as the target groups, ensured that the evaluation process was thorough and empathetic to the real-world challenges these communities face.

The contest provided participants with a unique opportunity to engage in an interactive and collaborative environment, fostering the creation of novel solutions that can significantly enhance the quality of life for those with visual or mobility impairments.

Problem Identification

As mentioned the competition focused on addressing the challenges faced by visually impaired and mobility-impaired individuals, with a particular emphasis on two key areas: adaptations for outdoor and indoor activities. Given the global context, where approximately 2.2 billion people live with some form of vision impairment, according to the World Health Organization (WHO), the importance of such innovations cannot be overstated. Among these individuals, 36 million are blind, and 217 million have moderate or severe vision impairment, highlighting the pressing need for technological solutions that can enhance their daily lives.

1] SAFE TRAVEL AND NAVIGATION

My focus was on the track dedicated to majorly outdoor activities with some parts of indoor activities like reading, where I identified specific problems through surveys to develop a solution that would significantly improve the mobility and independence of visually impaired individuals. The first issue I addressed was the challenge of safe travel in unfamiliar-unknown environments. Despite the use of mobility canes, visually impaired individuals often face difficulties in detecting obstacles beyond the cane's reach, leading to potential collisions, difficulty navigating crowded areas, and sudden changes in terrain. These obstacles can make independent travel a daunting and sometimes dangerous experience.

2] ACCESS TO VISUAL INFORMATION

Another major problem I focused on is the inability of visually impaired individuals to access visual information, such as reading currency notes or product details while shopping. This lack of access not only limits their independence but also creates a psychological burden of trust and insecurity when dealing with financial transactions and daily needs. For example, my relative, who lacks light perception, struggles with tasks as simple as identifying currency or understanding product labels, which are not universally available in braille. This issue highlights a broader societal challenge in making everyday tasks more accessible for those with vision impairments.

By focusing on these specific problems, I aimed to contribute to a more inclusive society where visually impaired individuals can navigate their environments and manage their daily tasks related to visual information with greater ease and confidence.

Developing An Elegant Solution

The solution presented by me is an innovative smart belt paired with a cap designed specifically for visually impaired individuals to enhance their ability to navigate various environments safely and independently along with the ability to detect currency notes and access text from any book or newspapers and product details. This wearable assistive technology integrates components such as a high-resolution camera along with xiao esp32 sense, ultrasonic sensor, haptic feedback(vibrational) modules, gyroscopic sensors, a small speaker, and a microcontroller STM32F4 Adafruit Blackpil and single-board computers like Unhiker and Raspberry Pi 5, all orchestrated to provide real-time guidance to visually impaired individuals.

The device is a real-time innovative assistive technology named Navigator and Visual Information Scanner [Nav-VIS]

It is divided into three subsystems as follows-

1] The Guide - The navigation component of this device, containing vibrational feedback and Raspberry Pi 5* (8 GB) with two RGB-based cameras for binocular metric depth pre-trained models for depth map and Yolov9-based custom trained model for object detection along with feedback from ultrasonic and accelerometer sensors via stm32blackpill to create a proper architecture for A* algorithm to navigate in real-time through crowded environments.

2] The Protector - The main purpose of this subsystem is to protect the user from sudden unwanted collision and it's designed in such a way for high emergency alerts, it vibrates strongly when the object is proceeding too fast towards the user. Also, it notifies the person about obstacles near them through vibration feedback mainly consisting of Stm32F4, vibrational feedback and ultrasonic sensor.

3] The Helper - A type of assistive subsystem designed solely for text reading, products and currency detection uses Seed Studio's Xiao camsenseS3 for currency classification, a small camera(webcam) for text and product detection and Unihiker* as its core computer( a Linux based single computer) using libraries like Tesseract, custom trained currency classification using Edge Impulse Studio.

*Using a single board computer like Nvidia Jetson would provide more high precision and accuracy.

**For depth estimation, the use of high-precision LiDARs or RealSense cameras was deliberately avoided due to two primary factors. Firstly, the financial and computational demands of these sensors are significant, which may not be feasible for a cost-effective and portable solution. Secondly, LiDAR technology involves the continuous emission of laser radiation, which raises concerns regarding safety and energy consumption in prolonged use and the use of such technology isn't budget-friendly.

The Hardware design

Let's understand how the device works and how its hardware is structured into three subsystems to handle the various tasks mentioned, here is the flowchart overview and we will go to each subsystem individually.

1 / 3

The product hardware set-up and design of all three subsystems are explained below in detail

THE GUIDE

The Guide consists of a novel methodology to navigate through various obstacles using object detection and metric depth estimation. The depth analysis is done through a highly pre-trained deep-learning model that's DepthAnything model, it contains both indoor and outdoor pre-trained models for robust and accurate depth estimation. The Guide subsystem has two RGB cameras structured in such a way the input is taken as a binocular vision of approximately 150 degrees field of view using a standard Open Cv program for video capture.

The frame from the video is now processed continuously and the custom-trained object detection model COCO dataset is deployed to find specific objects or obstacles that are in the field of view, and then the depth anything metric depth analysis gives a depth map of that frame which scaled to a grid of relative depths and then A* algorithm is applied to farthest position of destiny. The A* algorithm gives the shortest path to avoid obstacles The haptic feedback from the Gpio pins of Raspberry Pi is given to the vibrating motor coin for moving forward, backwards, left and right. To maintain high precision, the code is organized within a virtual environment and runs on a server using Python.

Hardware Build Instructions and Code [ Guide ]

The connections for the Guide subsystem are the simplest starting with its core computer you need to flash Raspberry Pi 5 using an SD card or USB drive, into Raspbian software namely Raspberry Pi OS (Legacy, 64-bit) Full, then after connecting Raspberry Pi to monitor display, mouse and keyboard start with downloading and upgrading following libraries of python (version 3.9)

If you face any problem regarding Torch since Raspberry Pi is an arm architecture it will show illegal instructions, the article on this website explains how to download it for RPI here.

For connection refer to the following image:

flow chart representing information exchang

Connections of port in Raspberry Pi-5

TRAINED MODEL -METRIC DEPTH ESTIMATION

The code for the following Guide navigation setup is uploaded on this GitHub repository, where it contains three different files and a custom-trained model for object detection. You can refer to GitHub DepthAnything for the depth-anything model which contains three models, this project uses the Base model. It uses a data engine to automate the annotation process for the vast corpus of unlabeled images it harnesses.

Pre-trained Models for depth estimation

The code contains various arguments for the webcam video path, the name and path of the depth-anything model you are using and the encoder type

Here are some previews of this model credits-DepthAnything

1 / 3

Depth Estimation and Object detection

The DepthAnything model is a highly trained model with great accuracy for depth estimation, The model gives a metric depth map but we aren't interested in the depth values of the complete image but the objects that we want that are acting as obstacles in the surroundings.

Here is step by step thought process and working of code-

Download the custom-trained object detection model and again place it in this folder. Then we just need to give the path for the location of the dataset files.

classNames = []
classFile = "Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/coco.names"
with open(classFile, "rt") as f:
classNames = f.read().rstrip("\n").split("\n")
configPath = "Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
weightsPath = "/Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/frozen_inference_graph.pb"

The depth estimation assumes that the coordinates of the object is at the centre of bounding object detected and are considered to be in 2d plane, it can be further converted into 3d reconstruction using Depth Map.

def process_frame(frame, depth_anything, args, frame_count):
  raw_image = frame
  coordinates =[]
# Performing the  object detection
  result, objectInfo = getObjects(raw_image, 0.45, 0.2, objects=['person', 'chair', 'table', 'dinning table','car'])
# Print metric depth only where objects are detected that we want
  if len(objectInfo) > 0:
      print(f'Frame {frame_count}: Metric depth (in meters) at object locations:')
      for box, className in objectInfo:
          x, y, w, h = box
          center_x = x + w // 2
          center_y = y + h // 2
# Ensure center coordinates are within image bounds we thought of
          if center_y < 0 or center_y >= raw_image.shape[0] or center_x < 0 or center_x >= raw_image.shape[1]:
              continue
          depth = depth_anything.infer_image(raw_image, args.input_size)
          depth_meter = depth[center_y, center_x] * args.max_depth / 255.0
          #time.sleep(2), Uncomment only when we need a crosschecking 
          center_x= int(center_x*0.075)-17
          depth_meter=int(depth_meter)+1
          print(depth_meter,center_x)
          coordinates.append([center_x,depth_meter])
          print(f'Object: {className}, Depth: {depth_meter} meters', center_x )
          check.Find_mypath(coordinates)
# check is just a function that sends the coordinates to A* algorithm
          print()
# Perform depth estimation on the entire frame
  depth = depth_anything.infer_image(raw_image, args.input_size)
# Normalizing and converting depth to visual representation for depth map
  depth_visual = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
  depth_visual = depth_visual.astype(np.uint8)
  depth_meters = depth * args.max_depth / 255.0
  if args.grayscale:
      depth_visual = cv2.cvtColor(depth_visual, cv2.COLOR_GRAY2BGR)
  else:
     cmap = matplotlib.cm.get_cmap('Spectral')
     depth_visual = (cmap(depth_visual)[:, :, :3] * 255).astype(np.uint8)
# Combining original frame with depth visualization
  split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
  combined_result = np.hstack([raw_image, split_region, depth_visual])
  return combined_result

Since all the functions are created let's deploy the depth metric estimation model,the script is designed to estimate depth from a webcam feed in real time using the Depth Anything V2 model.

the code snipet for above data

Here is the pictures (screenshots) of how the depth map is generated-

1 / 2

Now, We do normal video capture under the main script,This code captures video frames from a webcam or video file, processes each frame using a depth estimation model, and displays the results in real time. The loop continues until the user presses the 'q' key or the video ends. After the loop, the resources are released, and any open windows are closed.But in Raspberry Pi a construct is created with apush button if it's pressed the video depth estimation will stop.

cap = cv2.VideoCapture(args.video_path)
frame_count = 0
# Create output directory if not exists
os.makedirs(args.outdir, exist_ok=True)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
    break
frame_count += 1
combined_result = process_frame(frame, depth_anything, args, frame_count)
cv2.imshow('Depth Estimation', combined_result)
if cv2.waitKey(1) & 0xFF == ord('q'):
    break
# Release webcam capture and close all windows
cap.release()
cv2.destroyAllWindows()

The Python script ( as mentioned above I named it check.py ) implements the A* algorithm, a popular pathfinding and graph traversal technique used to find the shortest path between two points on a grid. In this case, the grid is a 15x5 matrix where each cell can either be walkable (unblocked) or non-walkable (blocked). The primary goal of the script is to navigate from a starting point (referred to as the source) to a destination on this grid, avoiding obstacles and finding the most efficient route.

here its a 4*7 grid but the project works on 5*15 grid

Below is the code for standard A* Algorithm, you can watch this video by MATLAB for better explanation of concepts behind how it works -

import math
import heapq
### Define the Cell class for better distinguistion ###
class Cell:
def __init__(self):
self.parent_i = 0 # Parent cell's row index
self.parent_j = 0 # Parent cell's column index
self.f = float('inf') # Total cost of the cell (g + h)
self.g = float('inf') # Cost from start to this cell
self.h = 0 # Heuristic cost from this cell to destination
# Define the size of the grid
ROW = 15
COL = 5
# Check if a cell is valid (within the grid)
def is_valid(row, col):
return (row >= 0) and (row < ROW) and (col >= 0) and (col < COL)
# Check if a cell is unblocked
def is_unblocked(grid, row, col):
return grid[row][col] == 1
# Check if a cell is the destination
def is_destination(row, col, dest):
return row == dest[0] and col == dest[1]
# Calculate the heuristic value of a cell (Euclidean distance to destination)
def calculate_h_value(row, col, dest):
return ((row - dest[0]) ** 2 + (col - dest[1]) ** 2) ** 0.5
# Trace the path from source to destination
def trace_path(cell_details, dest):
print("The Path is ")
path = []
row = dest[0]
col = dest[1]
# Trace the path from destination to source using parent cells
while not (cell_details[row][col].parent_i == row and cell_details[row][col].parent_j == col):
path.append((row, col))
temp_row = cell_details[row][col].parent_i
temp_col = cell_details[row][col].parent_j
row = temp_row
col = temp_col
# Add the source cell to the path
path.append((row, col))
# Reverse the path to get the path from source to destination
path.reverse()
# Print the path
for i in path:
print("->", i, end=" ")
print()
# Implement the A* search algorithm
def a_star_search(grid, src, dest):
# Check if the source and destination are valid
if not is_valid(src[0], src[1]) or not is_valid(dest[0], dest[1]):
print("Source or destination is invalid")
return
# Check if the source and destination are unblocked
if not is_unblocked(grid, src[0], src[1]) or not is_unblocked(grid, dest[0], dest[1]):
print("Source or the destination is blocked")
return
# Check if we are already at the destination
if is_destination(src[0], src[1], dest):
print("We are already at the destination")
return
# Initialize the closed list (visited cells)
closed_list = [[False for _ in range(COL)] for _ in range(ROW)]
# Initialize the details of each cell
cell_details = [[Cell() for _ in range(COL)] for _ in range(ROW)]

COMPLETE SIMULATION OF GUIDE

The Video for depth estimation and the A* algorithm involves running a video that simulates the real-world scenario being analyzed. Since it's not possible to directly show what the camera is observing in the actual environment, the simulation is visualized on a monitor through screen recording. This method allows us to capture and display the depth estimation process and the pathfinding algorithm in action, making it easier to analyze and understand how the system would operate in a real-world setting. Also, the use of the device is already shown in the video above.

Since we assumed a 5*15 grid for a 2d plane the values of x coordinated (that's the x coordinate of the image frame scaled to the real world) and z values which are depth measured, we assume the user to be at centre bottom of the grid and A* algorithm chooses the farthest point in a grid through a vector The video for the code simulation is -

THE HELPER

The Helper is an assistive subsystem specifically designed to support visually impaired individuals by facilitating text reading, product identification, and currency detection. At its core, the system utilizes Seed Studio's Xiao Camsense S3, a highly sensitive sensor designed for accurate currency classification. This sensor is custom-trained in Edge Impulse tinyML to distinguish between different denominations of Indian Currency with precision, ensuring reliable and quick identification during transactions.

In addition to currency detection, the subsystem incorporates a small, high-resolution camera (webcam) dedicated to recognizing and reading text from various sources, such as product labels, books, and other printed materials. It uses the Tesseract library of Python, an industry-standard Optical Character Recognition (OCR) engine capable of converting printed text from images into text with high accuracy. The camera captures the visual data, which is processed by the Unihiker—a compact, Linux-based single-board computer known for its robust performance and versatility. The text or currency detected is then processed through GTTS library-based text-to-audio converter and an audio amplifier and speaker are used as Unihiker supports audio output.

Hardware Build Instructions and Code [ HELPER]

The following is the circuit and diagram of the Helper-

flow chart communication system

Circuit assembled originally-

Circuit diagram for Audio Output

credits to DF robotics community projects

Soldered into the board -

The zoomed image of the camera and currency detector using cam-sense Esp32S3

1 / 2 • Scanner

The USB input cables of the scanner is connected to the USB hub and then the USB hub to the USB-typeA port of Unihiker.

Unihiker Setup

Let's start with Unihiker, for all the import libraries and updates required, plug in Unihiker and connect it to a network one can follow the Unihiker startup guide

Follow-up commands for installing libraries in Unihiker's terminal Check whether the Python version is 3.9 else update it.

sudo apt-get update 
sudo apt-get upgrade

pip install opencv-python
pip install playsound
pip install pyserial

** We can use pytts for offline mode

We will use buttons A and B person for selecting text reading or Money detection, respectively so we require a ping-pong library which can be downloaded using instructions given in the guide above.

Text Detection

For text detection, we will create two Python files one for image capture from live video via camera and the other for converting it to audio. To make it less complex let's first understand the second Python file which receives an image.

In the Unihiker's terminal command nano "name_of_python_file" in the .py extension for editing a python code (here the name of this file is ocr4.py ), start with importing the following libraries for the ocr4.py :

import pytesseract
import time
import gtts

Define a function, which is called when the image is captured so, that pytesseract can process text from the image into a string and then pass it as an argument to gtts for converting into sound. The use of the EasyOCR model for conversion of image to text is found to be much faster and more accurate but Unihiker doesn't support the standard EasyOCR model.
Finally, create a Python file that imports the above file when the image is captured using Open cv and save an image with described path.

This was part of text detection, the complete code that's used for text detection customised for Unihiker will be explained. For the case of object detection, one can use the same file for object detection used in navigation.

Here is a small video of Text reading-

Text reading

Currency Classification

This part of subsystem Helper is executed using Esp32S3 cam sense by Seeed Studio XIAO, as there is already a proper step-by-step explanation of How one can use this module for Image Classification.

Following the methods given in the article above I trained using 91 images of various currency denominations used in India, which include ten rupees coin, and Fifty, Hundred, Two Hundred and Five Hundred Rupees.

Using Arduino IDE to deploy the classification model you need to use the following method-

Connect the module with Arduino IDE, select the File>Preferrences>On Additional board URls paste the following .json link: https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_dev_index.json https://files.seeedstudio.com/arduino/package_seeeduino_boards_index.json
You need to download the Seeduino Xiao board manager in the board manager option
When the installation starts, you will see an output pop-up window. After the installation is complete, an “INSTALLED” option will appear

Now after following the methods given in Image Classification, deploying the model, from example select sketches, then sketches> examples esp32> esp32_camera and change the the pin lines from 32 -75 as given below:

#define PWDN_GPIO_NUM     -1 
#define RESET_GPIO_NUM    -1 
#define XCLK_GPIO_NUM     10 
#define SIOD_GPIO_NUM     40 
#define SIOC_GPIO_NUM     39
#define Y9_GPIO_NUM       48 
#define Y8_GPIO_NUM       11 
#define Y7_GPIO_NUM       12 
#define Y6_GPIO_NUM       14 
#define Y5_GPIO_NUM       16 
#define Y4_GPIO_NUM       18 
#define Y3_GPIO_NUM       17 
#define Y2_GPIO_NUM       15 
#define VSYNC_GPIO_NUM    38 
#define HREF_GPIO_NUM     47 
#define PCLK_GPIO_NUM     13

We will get the image detection name and string values of the detected currency in the Serial Monitor of Arduino IDE. Uncomment all the unnecessary details and keep the string that serial prints the name of the currency detected. We will be using a USB connection with Unihiker and receive the serial communication to set the baud rate to 115200 in Arduino IDE.

As we have already installed Pyserial in Unihiker, check whether we are receiving the string values of detected money after connecting the camsense to Unihiker through USB-HUB through the following code.

if __name__ == '__main__':
            ser = serial.Serial('/dev/ttyACM0', 115200, timeout=1)
            ser.flush()

            while True:
                if ser.in_waiting > 0:
                    line = ser.readline().decode('utf-8').rstrip()

As we can see below the currency is being detected and shown in terminal through serial communication, the USB serial communication part of Unihiker helps truly to integrate it with Camsense esp32s3

the serial terminal at jupyter notebook of unihiker

REMARK- The values of the port '/dev/ttyACM0' change its index to 'ttyACM1' or any other values after the reboot of Unihiker, so you need to check it through trial and error.

Here is a video that shows currency detection-

currency detection

Complete Setup of Helper

Let's set all the individual codes into one along with the use of physical buttons present in Unihiker. The code is as follows:

import cv2
import time
import serial
from pinpong.board import *
from pinpong.extension.unihiker import *
from unihiker import Audio
audio = Audio() 
Board().begin()
while True:

# Display the frame (or do other processing),don,t need this in device justy  for debugging
            cv2.imshow('Frame', frame)

    # Delay for 25 milliseconds (you can adjust this value)
            time.sleep(0.025)  # 25 milliseconds delay for proper read
            img_path = 'captured_image1.png'
            cv2.imwrite(img_path, frame)
            print(f"Image saved as {img_path}")
    # Press 'button A' to exit
            if cv2.waitKey(1) & button_a.is_pressed() == True: 
                break
        ocr4.hello()
# Release the video capture object and close all windows if reading is done
        cap.release()
        cv2.destroyAllWindows()

    if button_b.is_pressed() == True:
        time.sleep(1)
    

        if __name__ == '__main__':
            ser = serial.Serial('/dev/ttyACM1', 115200, timeout=1)
            ser.flush()

            while True:
                if ser.in_waiting > 0:
                    line = ser.readline().decode('utf-8').rstrip()
                    
                    if '10 rupee' in line:
                        print(True)  # Print True if '10 rupee' is detected
                        time.sleep(5)
                        line1 =' TEN RUPEE '
                    sound = gtts.gTTS(line1,lang='en')
                    sound.save("money1.mp3")
                    audio.play("money1.mp3")

The complete video of working of the Helper Subsystem is shown in the video below ,its just the above videos combined -

The Helper, as its name suggests, is one of the most important subsystems of this project. It creates a sense of independence for visually impaired persons. Using the complete prototype, the Helper hopes to achieve more efficient and faster scanning of visual information.

THE PROTECTOR

The Protector comprises STM32F4 Blackpil, an Ultrasonic Sensor and a strong vibration feedback motor. The work of the protector is simple to notify objects near the range of 50cm to 400cm (4 meters) and protect the user wearing this from accidental collision. To calculate the rate of objects moving fast towards the user, a method of difference of old and new positions is used, which is kept on ultra alert as soon as any object approaches fastly towards the user it triggers the motor to vibrate either to move left, right, back or front.

Hardware Build Instructions and Code [ PROTECTOR ]

First start with the following circuit layout for reference given below-

The pin diagram of Blackpil is as shown-

The image of the implemented circuit in the belt-

1 / 2

Arduino SketchUp

First starting with the conditions for notification through vibration feedback, Here the lower range is 50 cm, but 30 cm is more effective for this subsystem.

Start with defining variables and integers for the pin value of the Blackpil setup:

//left Top
int time_durationA=0;
int distance_oldA,distance_newA;
int delA,distanceCmA;
int left_top = PA0;
int trig_A = PB13;
int echo_A = PB14;
//right Top
int time_durationB=0;
int distance_oldB,distance_newB;
int delB,distanceCmB;
int right_top = PA1;
int trig_B = PB15;
int echo_B = PA8;
//left back
int time_durationC=0;
int distance_oldC,distance_newC;
int delC,distanceCmC;
int left_back = PA2;
int trig_C = PA9;
int echo_C = PA10;
//right back
int time_durationD=0;
int distance_oldD,distance_newD;
int delD,distanceCmD;
int right_back = PA3;
int trig_D = PA11;
int echo_D = PA15;

Three pins variables are required for each ultrasonic sensor and a vibration feedback coin for the left front, left back, right front, right back and front.

The void setup of the Arduino code has the following input output

void setup(){
pinMode(left_top, OUTPUT);//left Top
pinMode(echo_A, INPUT);//left Top
pinMode(trig_A, OUTPUT);//left Top
pinMode(right_top, OUTPUT);//right Top
pinMode(echo_B, INPUT);//right Top
pinMode(trig_B, OUTPUT);//right Top
pinMode(left_back,OUTPUT);//left back
pinMode(echo_C, INPUT);//left back
pinMode(trig_C, OUTPUT);//left back
pinMode(right_back, OUTPUT);//right back
pinMode(echo_D, INPUT);//right back
pinMode(trig_D, OUTPUT);//right back

Serial.begin(9600);
}

The code for the difference in position calculation and distance calculation using trigger and echo of the ultrasonic sensor. You can customise the following different values concerning metric depth and ultrasonic sensor capability. One can also use the Tof sensor for better accuracy. The following code comes under voidloop()

distance_oldA = distance_newA;//speed factor for distance approach so fast
distance_oldB = distance_newB;//speed factor for distance approach so fast
distance_oldC = distance_newC;//speed factor for distance approach so fast
distance_oldD = distance_newD;//speed factor for distance approach so fast
digitalWrite(trig_A,HIGH);
digitalWrite(trig_B,HIGH);
digitalWrite(trig_C,HIGH);
digitalWrite(trig_D,HIGH);
delay(2000);
digitalWrite(trig_A,LOW);
digitalWrite(trig_B,LOW);
digitalWrite(trig_C,LOW);
digitalWrite(trig_D,LOW);
time_durationA = pulseIn(echo_A,HIGH);
time_durationB = pulseIn(echo_B,HIGH);
time_durationC = pulseIn(echo_C,HIGH);
time_durationD = pulseIn(echo_D,HIGH);
distanceCmA = (time_durationA * 0.034) / 2;
distance_newA = distanceCmA;
distanceCmB = (time_durationB * 0.034) / 2;
distance_newB = distanceCmB;
distanceCmC = (time_durationC * 0.034) / 2;
distance_newC = distanceCmC;
distanceCmD = (time_durationD * 0.034) / 2;
distance_newD = distanceCmD;
delA = abs(distance_newA-distance_oldA);
delB = abs(distance_newA-distance_oldA);
delC = abs(distance_newA-distance_oldA);
delD = abs(distance_newA-distance_oldA);

The code for the basic systematic loop for protection and notification for one sensor is given by :

// For Sensor A at left top position 

if(delA > 10 && 50 < distanceCmA && distanceCmA < 250) {
analogWrite(left_top, 255); // ultra alert !!
delay(6000);
}
if(250 < distanceCmA && distanceCmA < 350) {
analogWrite(left_top, 60);
delay(2000);
}
if(150 < distanceCmA && distanceCmA < 250) {
analogWrite(left_top, 120);
delay(2000);
}
if(100 < distanceCmA && distanceCmA < 150) {
analogWrite(left_top, 180);
delay(2000);
}
if(30 < distanceCmA && distanceCmA < 100) {
analogWrite(left_top, 255);
delay(4000);
}

Similarly, the code can be written for all other ultrasonic sensors and other positions or the method of interrupt can be used. The distance is checked on each iteration of the voidloop() function, ensuring that the measurements are continuously processed. You can find more efficient code in GitHub link.

The Working of the Protector for one ultrasonic can be seen here.Similarly, it can be done for all other sensors as explained in the code.

The primary objective of the Protector device was to prevent accidental collisions for visually impaired individuals by detecting and responding to nearby objects that may pose a threat.

This vibration alerts the user and indicates the direction in which the obstacle is present in their surroundings (left, right, back, or front) to avoid the obstacle assuming the device considers the user as a system and everything under a 30cm radius is part of system, providing a simple yet effective solution to enhance user safety.

THE ASSEMBLY

The components of all three subsystems are kept inside a belt designed to be comfortable, and the pair of cameras are attached to a cap along with a speaker [It would be less comfortable to use but if budget allows one can use a wireless camera and speaker].

The Future Vision

The future vision for Nav-VIS involves several key advancements to enhance its functionality and user experience. One of the primary goals is to integrate voice control and description of surroundings, allowing users to interact with the device more naturally and intuitively like a personal assistant, making it easier for visually impaired individuals to operate the system and ask their assistant NAV-VIS about the surrounding like names of people, description of things in front of them or any scene.
Additionally, custom PCB designs will be developed to consolidate and optimize all hardware components, reducing the overall size and weight of the device, and thereby increasing comfort and wearability. Also introduces advanced RealSense cameras and NVIDIA Jetson-level single-board computers, enabling it to detect improved depth estimation with greater accuracy. This enhancement will allow users to better navigate complex environments and recognize people around them.
Moreover, by incorporating libraries like EasyOCR, the device will offer more precise and efficient text recognition, further aiding in the identification of products, reading currency notes, and accessing information from books or newspapers, a complete description of edible products and nutrient content. These upgrades will collectively make Nav-VIS a more powerful and versatile tool for visually impaired individuals, significantly improving their independence and quality of life.

Code

Depth estimation

import cv2
import numpy as np
import os
import torch

from depth_anything_v2.dpt import DepthAnythingV2

# Import the OpenCV extra functionalities for object detection
classNames = []
classFile = "/home/dsay/Documents/hackster/depthanything/Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/coco.names"
with open(classFile, "rt") as f:
    classNames = f.read().rstrip("\n").split("\n")

configPath = "/home/dsay/Documents/hackster/depthanything/Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
weightsPath = "/home/dsay/Documents/hackster/depthanything/Depth-Anything-V2/metric_depth/cocoobject/Object_Detection_Files/frozen_inference_graph.pb"

net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)

def getObjects(img, thres, nms, draw=True, objects=[]):
    classIds, confs, bbox = net.detect(img, confThreshold=thres, nmsThreshold=nms)
    objectInfo = []
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId - 1]
            if className in objects:
                objectInfo.append([box, className])
                if draw:
                    cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
                    cv2.putText(img, classNames[classId-1].upper(), (box[0]-10, box[1]-30), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
                    cv2.putText(img, str(round(confidence*100, 2)), (box[0]-200, box[1]-30), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
    return img, objectInfo

    # Perform depth estimation on the entire frame
    depth = depth_anything.infer_image(raw_image, args.input_size)
    
    # Normalize and convert depth to visual representation
    depth_visual = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
    depth_visual = depth_visual.astype(np.uint8)
    
    # Convert depth to meters
    depth_meters = depth * args.max_depth / 255.0
    
    # Apply colormap or grayscale
    if args.grayscale:
        depth_visual = cv2.cvtColor(depth_visual, cv2.COLOR_GRAY2BGR)
    else:
        cmap = matplotlib.cm.get_cmap('Spectral')
        depth_visual = (cmap(depth_visual)[:, :, :3] * 255).astype(np.uint8)
    
    # Combine original frame with depth visualization
    split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
    combined_result = np.hstack([raw_image, split_region, depth_visual])
    
    return combined_result


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Depth Anything V2 Metric Depth Estimation on Webcam')
    parser.add_argument('--video-path1', type=int, default=2, help='Webcam index for video capture (camera 1)')
    parser.add_argument('--video-path2', type=int, default=1, help='Webcam index for video capture (camera 2)')
    parser.add_argument('--input-size', type=int, default=518, help='Input size for image processing')
    parser.add_argument('--outdir', type=str, default='./vis_depth', help='Output directory')
    

    # Initialize DepthAnythingV2 model
    depth_anything = DepthAnythingV2(**{**model_configs[args.encoder], 'max_depth': args.max_depth})
    depth_anything.load_state_dict(torch.load(args.load_from, map_location='cpu'))
    depth_anything = depth_anything.to(DEVICE).eval()
    
    # Open webcam captures for both cameras
    cap1 = cv2.VideoCapture(args.video_path1)
    cap2 = cv2.VideoCapture(args.video_path2)
    frame_count = 0
    
    # Create output directory if not exists
    os.makedirs(args.outdir, exist_ok=True)
    
    while True:
        # Read frames from both cameras
        ret1, frame1 = cap1.read()
        ret2, frame2 = cap2.read()
        
        if not ret1 or not ret2:
            break
        
        frame_count += 1
        
        # Process frames from both cameras
        combined_result1 = process_frame(frame1, depth_anything, args, frame_count)
        combined_result2 = process_frame(frame2, depth_anything, args, frame_count)
        
        # Display the processed frames from both cameras
        cv2.imshow('Camera 1 Depth Estimation', combined_result1)
        cv2.imshow('Camera 2 Depth Estimation', combined_result2)
        
        # Press 'q' to quit
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    # Release webcam captures and close all windows
    cap1.release()
    cap2.release()
    cv2.destroyAllWindows()

Credits

Daksh Sambhare

1 project • 3 followers

Contact

Comments

Please log in or sign up to comment.

Awards

Honorable Mention Visual Impairments

Build2gether 2.0 — Inclusive Innovation Challenge

Haptic-Navigator and Visual Information Scanner

Things used in this project

Hardware components

Software apps and online services

Story

Overview

Build2gether Challenge

Problem Identification

Developing An Elegant Solution

The Hardware design

THE GUIDE

Hardware Build Instructions and Code [ Guide ]

Depth Estimation and Object detection

Navigation algorithm

COMPLETE SIMULATION OF GUIDE

THE HELPER

Hardware Build Instructions and Code [ HELPER]

Unihiker Setup

Text Detection

Currency Classification

Complete Setup of Helper

THE PROTECTOR

Hardware Build Instructions and Code [ PROTECTOR ]

Arduino SketchUp

THE ASSEMBLY

The Future Vision

Schematics

Circuit Diagrams

Circuit Diagrams

circuit Diagrams

Code

Depth estimation

Credits

Daksh Sambhare

Comments

Awards

Embed the widget on your own site

Haptic-Navigator and Visual Information Scanner

Haptic-Navigator and Visual Information Scanner

Things used in this project

Hardware components

Software apps and online services

Story

Overview

Build2gether Challenge

Problem Identification

Developing An Elegant Solution

The Hardware design

THE GUIDE

Hardware Build Instructions and Code [ Guide ]

Depth Estimation and Object detection

Navigation algorithm

COMPLETE SIMULATION OF GUIDE

THE HELPER

Hardware Build Instructions and Code [ HELPER]

Unihiker Setup

Text Detection

Currency Classification

Complete Setup of Helper

THE PROTECTOR

Hardware Build Instructions and Code [ PROTECTOR ]

Arduino SketchUp

THE ASSEMBLY

The Future Vision

Schematics

Circuit Diagrams

Circuit Diagrams

circuit Diagrams

Code

Depth estimation

Credits

Daksh Sambhare

Comments

Awards