These “Vision Glasses” Help the Blind Read By Transcribing Text to Audio
Akhil Nagori designed Vision Glasses V2 to help those with visual impairments read any text at the press of a button.
Disabilities make life difficult in many ways that are invisible to the more fortunate among us. You know, of course, that stairs are a problem for people in wheelchairs. But have you ever considered how those same people deal with everyday tasks like bathing, or getting into and out of bed? Those with visual impairments face many challenges that the rest of us might not even notice, such as determining the denomination of a banknote or reading a letter received in the mail. That’s why student Akhil Nagori designed Vision Glasses V2.
This is a follow-up to Nagori's previous “Smart AI Glasses” project and it serves the same purpose: to automatically read text aloud. The world is full of written communication without an accessible alternative, like braille or audio recordings. You need only walk around your neighborhood or even your own home to find plenty of proof of that. Nagori’s Vision Glasses V2 provide a simple and convenient solution. The wearer only has to look at whatever they want to read and push a button on the frame of the glasses. The device will then convert detected text into audio.
Cost is always a concern, so Nagori chose to use off-the-shelf components that are affordable and attainable. The primary component is a Raspberry Pi Zero 2 W single-board computer, which does all the processing. It resides in a 3D-printed enclosure attached to the frame of the glasses. When the user presses the button, the Raspberry Pi captures a photo through a mini Zero Spy Camera from Adafruit. Once it analyzes the photo and deciphers the text it sees, it uses a text-to-speech function to read that text aloud with audio output via two mini speakers. Power comes from a small rechargeable lithium battery.
For all that magic to happen, Vision Glasses V2 needs to do two things: find and transcribe text in the captured image, and convert that text into speech. The Python code does the former using API calls to docTR OCR (Optical Character Recognition), which is open source and uses TensorFlow 2 and PyTorch to extract and parse text in images. It handles the latter using eSpeak NG, which is open-source text-to-speech software that is lightweight and supports more than 100 languages/accents.
There is a bit of a delay between pressing the button and the audio result, but Vision Glasses V2 seem quite usable. The design files and code are available for anyone that would like to build their own.