VisionAid: Assistive Tech for the Visually Impaired

VisionAid is a smart device that provides real-time audio descriptions of the surrounding environment, helping visually impaired users navig

BeginnerWork in progress86

VisionAid: Assistive Tech for the Visually Impaired

Things used in this project

Hardware components

Raspberry Pi Camera Module

Raspberry Pi Zero

Seeed Studio Grove - Speaker

Story

VisionAid: A Smart Device for Visually Impaired Users

1. Introduction

The VisionAid project aims to enhance the daily lives of visually impaired individuals by providing real-time audio descriptions of their surroundings. By using a Raspberry Pi camera for visual input, a Raspberry Pi Zero for processing, and a Seeed Studio speaker for audio output, VisionAid will help users navigate their environment with increased confidence and independence.

2. Objectives

Real-Time Environment Detection: Implement object detection and scene recognition using the Raspberry Pi camera.
Audio Feedback: Provide real-time audio descriptions of detected objects and scenes via the Seeed Studio speaker.
User-Friendly Interface: Ensure easy operation through voice commands or simple button inputs.
Portable and Efficient: Design the device to be lightweight, power-efficient, and easy to carry.

3. Hardware Components

Raspberry Pi Zero: Serves as the central processing unit (CPU) of the system.
Raspberry Pi Camera: Captures live video feed for processing.
Seeed Studio Speaker: Delivers audio descriptions to the user.
Portable Power Supply: A rechargeable battery to power the device.

4. Software and Implementation

Operating System: Use Raspbian OS, optimized for the Raspberry Pi Zero.
Object Detection: Implement a pre-trained neural network (e.g., YOLO or MobileNet) using TensorFlow or PyTorch to identify objects in the camera feed.
Audio Conversion: Use text-to-speech (TTS) software like espeak or gTTS to convert detected objects and scenes into spoken language.
User Interface: Create a simple UI for configuration and operation, accessible via command-line or a minimal touchscreen interface.

5. Code Example

Here’s a basic outline for setting up object detection and audio output:

python

Copy code

import cv2 import numpy as np import pyttsx3 from time import sleep # Initialize camera, TTS engine, and model camera = cv2.VideoCapture(0) engine = pyttsx3.init() engine.setProperty('rate', 150) # Speed of speech # Load YOLO model and COCO labels net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg') layer_names = net.getLayerNames() output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] with open("coco.names", "r") as f: classes = [line.strip() for line in f.readlines()] def detect_objects(): ret, frame = camera.read() height, width, channels = frame.shape # Preparing the image for YOLO blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False) net.setInput(blob) outs = net.forward(output_layers) # Process detection results class_ids = [] confidences = [] boxes = [] for out in outs: for detection in out: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] if confidence > 0.5: center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) x = int(center_x - w / 2) y = int(center_y - h / 2) boxes.append([x, y, w, h]) confidences.append(float(confidence)) class_ids.append(class_id) # Draw bounding boxes and generate audio descriptions for i in range(len(boxes)): x, y, w, h = boxes[i] label = str(classes[class_ids[i]]) confidence = confidences[i] print(f"Detected {label} with {confidence:.2f} confidence") engine.say(f"Detected {label}") engine.runAndWait() while True: detect_objects() sleep(2) # Pause before next detection

python
Copy code
import cv2
import numpy as np
import pyttsx3
from time import sleep

# Initialize camera, TTS engine, and model
camera = cv2.VideoCapture(0)
engine = pyttsx3.init()
engine.setProperty('rate', 150)  # Speed of speech

# Load YOLO model and COCO labels
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

def detect_objects():
    ret, frame = camera.read()
    height, width, channels = frame.shape
    
    # Preparing the image for YOLO
    blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    outs = net.forward(output_layers)

    # Process detection results
    class_ids = []
    confidences = []
    boxes = []
    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    # Draw bounding boxes and generate audio descriptions
    for i in range(len(boxes)):
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        confidence = confidences[i]
        print(f"Detected {label} with {confidence:.2f} confidence")
        engine.say(f"Detected {label}")
    
    engine.runAndWait()

while True:
    detect_objects()
    sleep(2)  # Pause before next detection

6. Testing and Validation

Field Testing: Test VisionAid in various environments (e.g., indoors, outdoors) to ensure reliability and accuracy of the object detection and audio feedback.
User Feedback: Gather input from visually impaired individuals to fine-tune the audio descriptions and usability.

7. Conclusion

VisionAid leverages modern technology to empower visually impaired users, enhancing their ability to perceive and interact with the world around them. By combining object detection with real-time audio feedback, this device can significantly improve users' autonomy and safety.

This approach can serve as the foundation for further development, potentially including advanced features like facial recognition, GPS integration, and more.

VisionAid is a smart device that provides real-time audio descriptions of the surrounding environmen

import cv2
import numpy as np
import pyttsx3
from time import sleep

# Initialize camera, TTS engine, and model
camera = cv2.VideoCapture(0)
engine = pyttsx3.init()
engine.setProperty('rate', 150)  # Speed of speech

# Load YOLO model and COCO labels
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

def detect_objects():
    ret, frame = camera.read()
    height, width, channels = frame.shape
    
    # Preparing the image for YOLO
    blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    outs = net.forward(output_layers)

    # Process detection results
    class_ids = []
    confidences = []
    boxes = []
    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)
    
    # Draw bounding boxes and generate audio descriptions
    for i in range(len(boxes)):
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        confidence = confidences[i]
        print(f"Detected {label} with {confidence:.2f} confidence")
        engine.say(f"Detected {label}")
    
    engine.runAndWait()

while True:
    detect_objects()
    sleep(2)  # Pause before next detection

VisionAid: Assistive Tech for the Visually Impaired