For individuals with visual impairments, shopping in a mall or buying groceries can be a daunting challenge. Relying on others for these tasks can significantly impact a person's confidence and sense of independence. On the other hand, the ability to shop independently is not just practical but also deeply fulfilling.
To address this need, I propose a novel solution: smart spectacles designed specifically to empower visually impaired individuals to navigate and shop freely in unfamiliar environments, such as a new mall. This approach will enhance autonomy, confidence, and the overall shopping experience of the visually impaired person.
A mall in question will have these smart eyewear installed, with a computer acting as a centralized server and a preloaded mall map (in version 1).
Design ConceptMy objective is to create smart glasses that would enable shopping and money recognition for visually impaired persons (VIPs). These glasses use a video stream to identify various products on store shelves. Cross-referencing the selected products with a database will provide details such as composition, price, and other relevant information. After that, the consumer will get this knowledge through auditory feedback, empowering them to make wise purchases.
The second module of these glasses will make it easier for VIPs to navigate malls. To do this, the eyewear will identify visual markers placed on the ground and record them using the built-in camera in order to determine the path. The shortest path to the intended destination will be found using the A* algorithm.
Version1:
- Identifying products on the shelves ;
- Easy navigation through haptic feedback for VIPs to reach at a certain destination;
- An app that stores the person's shopping history locally;
Version2:
- Involving IR based sensors to provide real time object detection for any sudden placement which is not already in the map;
- Creating a decentralized server such that the person just need to scan a QR code and a local server will be created in the mobile which will handle all the computations;
- Making the app more proficient in maintaining the records of the person's history ;
- Using a QR based system to directly connect our mobile system with the eyewear;
Current Status
Right now, I'm using version 1. My glasses are able to identify several items on the shelf. After that, each product is recognized and identified separately.
The eyewear will provide all product information through audio.
Haptic feedback is used to navigate the VIP through the mall. There is nothing to do with the implementation of haptic motors and speaker because the necessary hardware are unavailable. During testing, the instructions are checked with a serial monitor.
We have developed a novel pipeline for the product recognition module that consists of three stages: the first is to identify multiple products placed in a densely populated mall's shelves; the second is to localize and preprocess (crop) each detected product to the desired size; and the third is to actually identify and recognize the product.
For object detection and localization in dense environments like those of a shelf, I used a YOLO_NAS model and trained it on the SKU-110 dataset. Thousands of photos captured in actual retail centers are included in this dataset. The fact that every product on the front side is detected makes the results extremely believable.
The identification of the product is the next step in the dataflow. In order to do that, we initially considered building a machine learning model to extract features from the product.
To do that, we are use the VGG-16 model and the transfer learning approach. To make the model appropriate for our task, we have added a few extra layers and made some modifications to it.
Every time a new product is discovered during model testing, the trained model is used to extract its features. The two most related items are then displayed as the output's result using K-Nearest Neighbor.
The same approach is applied for currency detection.
The code for both the modules are provided below.
Flask Server
Every calculation related to product identification will be carried out on a decentralized cloud server. At the moment, version 1 uses a Flask server with a Python foundation to host all services and execute instructions.
The camera in the eyewear sends the image to the server hosted on a centralized system each time the user calls for an audio input to submit image data (currently).
After all of the models for product recognition have been performed on the server, the resulting output is once more sent to the eyewear via the internet. After that, audible feedback is created from the output data.
Collecting Audio Input
For a total of 10 seconds, we are collecting audio data from the user every five seconds using XIAO-ESP32S3 Sense. As of right now, version 1 uses Keyword Spotting Algorithms to find particular keywords in user input.
We are using an external API from DeepSpeech via the Flask server to translate audio data into text.
To make our eye-wear app compatible, we need to provide a way to connect our application with the hardware, which we are doing using sending data to the cloud.
The flowchart below explains the basic flow of data and working of the eye-wear.
Lets now work on how to send the data to our firebase console using Blue's Notecarrier-F and Notecard Wifi. Before using Notecard, we need to make an account in Notehub.io, which acts as an intermediatry between notecard and our database to gather data.
To send and receive data, we have specific routes defined in our flask application.
@app.route('/notehubWebhook', methods=['POST'])
def notehub_webhook():
logging.debug('Received POST request')
raw_data = request.data
logging.debug(f'Raw data: {raw_data}')
try:
data = request.get_json()
logging.debug(f'Received data: {data}')
if data is None:
raise ValueError("No JSON data received")
data['creationTimestamp'] = datetime.utcnow()
db.collection('notehub-data').add(data)
return jsonify({"status": "success", "message": "Data received and saved"}), 200
except Exception as e:
logging.error(f'Error processing request: {e}')
return jsonify({"status": "error", "message": str(e)}), 500
Upon calling this route, Notecard's JSON data will be posted to the Notecarrier. To accomplish this, first register our Notecard on Notehub.io, and then go to the Routes area and build a route.
Notehub is used to route the data from the notecard to our database. A similar procedure is used to obtain data from databases. Another route is made for this task in our Flask application and subsequently in the Notehub.
Navigation Inside MallsWe have employed coin buzzers, or haptic motors, to give blind people directions by gently vibrating each ear. The vibrations indicate which way the person should go.
Visual signs on the floor have been utilized as a means of incorporating navigating for the blind. All of the mall's hallways will have tapes linked side by side in two different colors.
In order to determine whether a person is traveling in the right direction, a camera mounted on their eyewear will pick up on these markings. Various IMU sensors are employed in dead-reckoning methods to localize the individual in a mall.
Kalman filters can then be used to integrate the orientation and velocity data from visual markers with the data from IMU sensors to produce a more precise dead reckoning of the individual.
Detecting Visual Markers
The visual marker's orientation with respect to the image can tell us about the actual orientation of the person while the shift in position of some certain points in two consecutive images can provide us with velocity data.
Our method will make use of computer vision to identify the visual indicators from the floor. In order to detect the tapes, we must first pre-process the image by blurring the background (using a Gaussian filter) and then apply the Canny Edge detection algorithm with the Hough transform. However, putting this into practice will make our time more challenging.
Therefore, in order to minimize the processing time, we will identify the tapes using HSV values, and then every ten seconds, we will utilize the edge detection algorithm. This cuts down on time without significantly sacrificing accuracy.
Starting from a certain position, we will use the A* algorithm to provide the user a direction guide to reach the destination.
Power Supply and Schematics
A voltage of 3.3 volts is needed for the STM32F411CE (Blackpill) and ESP32S3-SENSE. We shall employ a conventional voltage regulator to provide a steady voltage without overcharging.
Both the battery's positive and negative terminals must be linked to the voltage regulator's positive and ground terminals, respectively.
ESP32S3-SENSE and Blackpill's corresponding 3.3V pins are now connected to the voltage regulator's 3.3V output. Both devices' grounds need to be linked to the output ground.
The overall pinout scheme of our device is shown in the below image.
For MPU9050 Sensor (IMU):
- MPU-9050 VCC → 3.3V on STM32F411
- MPU-9050 GND → GND on STM32F411
- MPU-9050 SCL → PB6 on STM32F411
- MPU-9050 SDA → PB7 on STM32F411
- MPU-9050 AD0 → GND (0x68) or VCC (0x69)
For ESP32S3-SENSE:
- RX/TX to corresponding TX/RX on STM32F411CE
For Notecard and Notecarrier:
- Notecard SCL → STM32F411 PB6 (I2C1 SCL)
- Notecard SDA → STM32F411 PB7 (I2C1 SDA)
- Notecard V+ → 5V USB Power Supply or 3.3V from STM32F411
- Notecard GND → GND of STM32F411
Comments