Published August 27, 2016 © MIT

Smart Cap: Vision For The Visually Impaired

Smart Cap is an assistant for visually impaired which narrates the description of scene.

IntermediateFull instructions providedOver 2 days18,157

Best Alexa Skills Kit with Raspberry Pi

The Internet of Voice Challenge

Smart Cap: Vision For The Visually Impaired

Things used in this project

Hardware components

Amazon Alexa Amazon Echo

Raspberry Pi 3 Model B

Creative USB webcam

Any webcam or RPi camera module can be used

Power Bank 5v

Any 5v power bank to power Raspberry Pi. Preferably 1,000mAh or higher

Software apps and online services

Amazon Web Services AWS DynamoDB

Amazon Web Services AWS Lambda

Amazon Alexa Alexa Skills Kit

Microsoft Azure

Used Microsoft cognitive services for image recognition

Story

Smart Cap is an assistant for visually impaired which narrates the description of scene.

Listen below to the challenges that Bhupati faces because of his visually impairment and how smart cap can help to resolve some of those.

Why:

Figure: Problem faced by visually impaired people

There are about 285 million visually impaired people in the world. They are not able to experience the world the way we do. Smart cap aims to provide this missing experience for them. The system uses state of the art deep learning techniques from Microsoft Cognitive Services for image classification and tagging. The experience is powered by the voice assistant 'Alexa' through Amazon Echo.

What:

The smart cap aims bring the beautiful world as a narrative to the visually impaired. The narrative is generated by converting the scenes in front of them to text which describes the important objects in the scene. Examples of text include 'A group of people playing a game of football', 'yellow truck parked next to the car', a bowl of salad kept on table'. For the first prototype of the system, one line along with some keywords are played as an audio to the users but in the later versions a detailed description would be added as the feature.

How:

Figure: VUI Diagram

The architecture of the system includes Amazon Echo, Raspberry Pi (any of ver 1,2,3 will work) and online computer vision API's.

A webcam which is retrofitted into a regular cap is connected to the Raspberry Pi. The code given here runs on Raspberry Pi. The function of the code is to capture the image from the webcam and send it to Microsoft API's for recognition task. The response is then inserted to DynamoDB. When the user asks Alexa to describe the scene, the Alexa Skills Kit triggers Amazon Lambda function to fetch the data from the database (DynamoDB). The correct text is the played as an audio on the Alexa device.

Note: The app is live on Alexa skills kit. You can enable it through mobile app or directly through voice command. Here is the link: http://alexa.amazon.com/spa/index.html#skills/dp/B01HZ9AETK/?ref=skill_dsk_skb_sr_0

Future Work:

Form factors: Glass with a camera to take pictures

Functionalities: Face and emotion recognition, text to speech for books, sign boards and restaurant menu, Indoor navigation with visual SLAM, outdoor navigation with GPS and traffic light color interpretation.

Figure: Additions to smart cap in future