This project was part of Build2Gether 2.0 and was aimed at helping people with weak vision inside the household.
Background:People with weak vision often find it difficult to distinguish things in their household. The weak vision can be due to a number of factors such as retinitis pigmentosa, injuries, cataracts, glaucoma, etc. This often results in the inability to find important objects such as keys, ingredients for cooking, remote controllers, etc. This inability makes it hard for people with such conditions to function as usual.
Initial Idea:The idea was to create a wrist wearable having a mounted camera which can help people identify objects. This camera would act as a guide inside the home by providing real-time assistance in the form of speech. The user can simply point towards an object and the wearable would use on-device Machine Learning to speak out the name of the object. The user can further interact with the wearable using voice commands and understand more about the object. This can be performed by using optimized YOLO models paired with speech synthesis and recognition. The wearable can also be used as a guide while walking inside the house.
Feedback received:Through the feedback received by the contest masters, we understood that the feasibility and practicality of such a wearable would be challenging. Specifically, the processing power demand would be huge and the availability of such hardware in such a small form factor would be quite hard. We used this feedback to shift from a wearable to a handheld device, where the processing would be happening on a separate device with enough compute or the cloud. We pushed focussed more on indoor applications of our idea so that we could come up with a useful solution.
Evolution (The final idea)!By analyzing the feedback and limiting ourselves to an indoor application of a handheld device, we were able to come up with kitch-Assist!
As the name suggests, this idea is a Kitchen Assistant! It helps you find your ingredients, distinguish between similar looking things, provide ideas on what to cook, make grocery lists, or just have a chat in the kitchen.
How it works:
- The user uses Seeed Studio's XIAO ESP32S3 along with the sense module to help with seeing in the kitchen.
- DF Robot's Unihiker helps display object/ingredient labels/quantities and relay user queries to the chat assistant.
- The chat assistant runs locally on Minisforum mini-pc powered by AMD Ryzen AI. The chat assistant uses the information obtained through the ESP32S3 and provides relevant answers.
Although our post describes a proof of concept for the kitchen, this idea can be easily extended beyond the kitchen as well.
Using this solution, users having weak eyesight can identify and distinguish between items and ingredients in the kitchen, have a chat assistant note down points (maybe a grocery list), take help of the assistant in cooking and finding items in the kitchen, etc. The possibilities are endless! The next section gives a deep-dive into the technical details and implementation.
The technical realm:Setting up to get going:
SeeedStudio'sXIAO ESP32S3:
A detailed overview for setting up micropython on the ESP32 can be found on this link. If you are comfortable with Arduino, you can also visit this repo to learn more about ESP32S3 and the Sense module. Follow the tutorial on the link to setup your ESP32 and test whether it is working by using the video streaming example. Note that you may have to stop the streaming server and start it again for the camera to work some times. The following images show how the finished setup looks like. The ESP32 and Sense module heat up quite a bit during video streaming. Please be careful while handling them.
DFRobot'sUnihiker:
Unihiker comes loaded with all the required software needed for development. Simply connect Unihiker to the PC and go to http://10.1.2.3 using your browser. Open network settings and connect to the WiFi to which the ESP32 is connected. Once done, go to the file upload section as shown below and select the streaming_client.py and hit upload.
Navigate to the root folder (or the folder you have uploaded to) on the Unihiker and check whether the file is uploaded or not. Once the file is visible, start the streaming server using the instructions from the ESP32's tutorial. After the server is up and running, click on the streaming_client.py in Unihiker and see the stream on the screen.
Setting up RAG on Minisforum Mini-PC:
To setup the RAG workflow on the mini-pc, follow the detailed guide mentioned on this link. The final flow should look somewhat like the screenshots shown in this post.
The workflow:
Once everything is setup, we start capturing some images!
- We use the Unihiker to click images with the ESP32S3 Sense.
- Clicking the Unihiker's A and B button takes a snap.
- The photos can be transferred to the mini-pc and passed into an image description model.
- Once we have the descriptions, we can load this into our RAG workspace and chat with the LLM locally for ideas.
- This can help us make grocery lists, get recipe ideas, check where we have stored something, etc.
Some images:
The following images show the setup used to capture images.
The project can help in providing assistance to not only people with vision issues, but also support other people manage their indoor activities.
Although we focussed on assisting with simple indoor activities, this project could evolve further and help homes and tools become more accessible. Through the use of voice-based AI assistants combined with motorized doors and equipments, we might be able to make this project into a customized home solution for people who live with motor neuron disease or similar conditions. Some of the improvements to be done are listed below:
- Add voice based assistance
- Integrate the RAG+LLM workflow to directly communicate with the Unihiker and ESP32
- Add support to access the network from anywhere and store data on the cloud
Comments