For a visually impaired person from a country like mine, basic day to day activity's such as identifying which bus is approaching the bus stop or finding the sidewalk can be highly tedious. The underlying cause is due to poor infrastructure and inconsistent regulatory enforcement. Overgrowth of trees, cars parked on the sidewalk, uncovered canals near sidewalks are few examples of how difficult it can be for a visually impaired person to navigate without assistance. Spatial awareness using auditory cues can help in increasing the confidence of the visually impaired, to explore and interact with the world. Reducing the need to ask strangers for help can increase confidence in visually impaired individuals. Sometimes its small things such as lack of tactile bumps in appliances, applying the prescription sticker over the braille and so on that causes the inconvenience. My system tries to reduce the dependence of visually impaired on strangers for many trivial tasks which they would otherwise need help for.
Solution
My proposed solution is to make a system which analyses the surroundings in real time. Neural networks are used to provide auditory cues based on the surroundings in real time, if it detects obstacles that can hinder the person. The system will only provide real time auditory feedback if the system detects a possible collision with the obstacle. The system provides the user with the freedom to select between different tasks using tactile buttons, this is to help the user avoid talking out loud in the public and to minimize the need for frequent interaction with strangers. The tactile re programmable buttons on the system can include tasks like identifying the text on a bus, identifying a nearby government public servant or the nearest sidewalk without depending on a stranger and so on, depending on the issue commonly faced by the individual. Using a specifically trained YOLO model with optimized weights the system can identify text on a bus, identifying a nearby government public servant or the nearest sidewalk.
YOLOv5 is the method used by the application for detection. This is an essential component of the software that helps in tracking and identifying the immediate environment.
Large Language Model (LLM): GPT 4o is used by the application as its LLM. This makes it possible for the application to successfully comprehend and produce human language.
Text-to-Speech (TTS): The program converts text to speech using gTTS. The ability to deliver information audibly through the program is a crucial feature for users who are visually impaired.
PoC
Comments