The scope of this project is to provide, a simple way to help people with visual impairment to identify or find a specific objects in a range of few meters.
I work on the IT, and I'm applying here a Logistic solution to solve domestic problems: instead of addressing millions of items on large warehouses, we work with hundreds of objects distributed in the house.The magic is made by UHF RFID, operating at 840-960MHz and capable, and working on higher distances than standard 13.56 MHz RFIDs.With the antenna provided to me by M5stack, I can reach over 1 meter.There are different tags available on the market, and they are very cheap. I tested the most common adhesive versions, and the washable laundry tag, for textile applications.The usage is very simple: There are only 3 buttons, a microphone and a speaker.
With the buttons, I can set the operating mode, and on the microphone I mention the name of the object I want to search or store for later usage.The speaker will repeat the commands, and it's used to announce the objects found.
It is worth a mention how the names are recognized. As I didn't want a long training for the voice recognition, I used on this project Whisper from OpenAI.Whisper is a pretrained speech to text engine, providing multiple languages, and available as API over internet or as Docker container, to be used on user's networks. Indeed, I installed that version on my mini Kubernetes cluster. To store the RFID codes and object names, I use a database SQLite3, stored to the SD card connected to the sense board.
UsageAt the boot, the esp32 will connect to the network and initialize the different libraries, and if everything is OK, will play a beep on the speaker. It is possible to decrease the Volume by pushing the V button, to not disturb too much, or connect some headphones on the 3.5mm jack connector on the left side to exclude the speaker. On the first usage, the database is empty, it's necessary to store the object we already labelled with the RFID tags.
Function: Store new objectBy pressing together the buttons ID and F, the device will ask the name we want to store. At this point, we need to push the V button and pronounce the name of the object. This name will be stored immediately on the database. Then the device will ask to put the object closer to the antenna to read the RFID code. This code will be stored on the database and associated to the object's name. It's done, the device will be ready to accept new commands.
Function: Find an objectBy pressing the F button, the device will enter the "find" mode and will ask for the name of the object we want to find.We need again to push the V button and pronounce the object name. Like the previous example, our voice will be sent to the Whisper API to convert into text, so the device will have back a valid string to search into the database. If the object is found, and there are codes associated (one or more), the antenna will be enabled, and if one of the codes is recognized in the range, the speaker will play a beep. Find mode will last for 5 minutes or until we press again the F button.
Function: Identify the object on the rangeBy pressing the ID button, the device will enter into identify mode. The RFID antenna will be enabled and for any device in the range, the device will check on the database and pronounce the respective name using Text to Speech API, MaryTTS again installed as Docker container on Kubernetes.
PCBI would like to thank PCBWAY for having sponsored this project. I easily uploaded the Gerber file online and in less than one week I received my PCB at home.As the diagram is quite simple, I soldered all the components in a few minutes. I put the buttons in a separate circuit to keep the flexibility of positioning the PCB on a different position from the CPU.
The core of this project is the UHF RFID antenna, send to me by M5stack for this project, many thanks for that.
The antenna is really easy to use, it's connected via serial port, and it can be controlled by the library UNIT_UHF_RFID.h provided by M5Stack.This library cover all the functionalities, included storing some data on the tags, functionality not required on my project, but interesting for other projects. The best sensibility can be achieved, when the tag is perpendicular to the flat side of the Antenna. During the tests, I managed to get over 1.5 meters, even with a wall in between, while on average, the tag can be detected at around 1 meter of distance. I noticed different performance for different brand of tags.
ESP32 board and softwareThanks to Xiao for having sent to me the ESP32S3 Sense board. Although it's only 21 x 17.5 mm, it's so powerful and provide:
- ESP32 S3 CPU with 8MB of flash and 8MB of PSRAM
- OV2640 Camera (not used on my project)
- Mem microphone
- SDCard Slot
The additional PSRAM is mandatory for this project, as the buffer for the audio samples can be up to 40 seconds. I created two buffer, one for the audio coming from the microphone and one for the audio coming from the Text to Speech API. The SDCard initially was used for two scopes:
- SQLITE Database, storing the object names and object codes
- Cache for the audio coming from Text to Speech
As I was having some Database Read/Write issue while using the cache, I momentarily disabled this capability, maybe on a second iteration will enable it again, removing the delay of calling Text2Speech API every time.
The audio part is made using the provided I2S microphone, and a max98357 for the audio out, via speaker or 3.5 mm Jack for the ear set.
The schema contains 6 buttons, but only 3 are in use on this project, while the remaining 3 are sharing the pins with the max98357.Speaking about the software, it contains WiFi and http/https libraries, I2S, SD management and SQLite libraries. Everything is initialized on the Setup function, together with 2 autonomous tasks:
One for WiFi connection and one for the I2S sampling, required to not interfere with the main cycles. The volume regulation is made on software, by multiplying the integer value to a float number lower than 1. The max98357 provide a gain regulation pin, but it is not practical and cannot really lower the volumes over some values. All the main functions are executed on the Loop section of the code. According to the button pressed, the software can enable the different functions described above.Configuration:
// WiFi network name and password:
const char* ssidName = "YOUR_SSID";
const char* ssidPswd = "YOUR_PASSWORD";
// Use https if required
#define URL_STT "http://<your_container>/whisper/asr?output=json&language=en"
#define URL_SERVER "<your_container>"
#define URL_PATH "/whisper/asr?output=json&language=en"
#define TTS_PATH1 "/marytts/process?INPUT_TEXT="
#define TTS_PATH2 "&INPUT_TYPE=TEXT&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE&LOCALE=en_US"
Before building the software on the Arduino IDE, you need to put the right values for the SSID and Password, and, if using internal docker containers for whisper and marytts (see below), here put the IPs.You need also an SD card, I used 8GB, where to put the empty database ctrfid.db provided.
Web server
For debug purpose, it's available a simple web server with a SQLite client. It can be used to query the database using a simple web form, to check as example if the objects are inserted, or eventually rename them or delete. It's just for debugging purpose, not required on normal usage or initial setup.
The full code is attached.
Charge stationAfter the shells, I also 3d printed a simple charge station, with the same shape. On this implementation, the charge station is simple and provide just 5 volts via magnetic pogo pins. The 3rd pin, it's used to get battery voltage from the device. The idea is to have a speaking system announcing the battery voltage, unfortunately I was not able to complete this part. Anyway, there is enough space on the charge station for some microcontroller and a speaker.We should not forget to add an RFID tag also on the charging station,
The STL files are available on Thingiverse
TTS and STTText to Speech and Speech to Text are delegated to external API, as I was not able to find embedded solutions not requiring training. I evaluated PicoTTS for the TTS part, which runs on ESP32, while STT is more challenging, usually requires 2 GB of Ram and very powerful CPU, so in the end I decided, as the device is designed for domestic use, to deploy the 2 solutions using the Docker containers provided. On my Desk I have a personal Kubernetes cluster running on a ChromeBOX, with 16GB of RAM and CPU Intel Celeron at 1.6Ghz, enough for my needs.
Probably, a Raspberry 5 will be faster, but I didn't have one available for my tests.
I used Kubernetes on K3S, but any other Docker solution is fine as well, the solution proposed is just one example fitting my needs.
If you don't want, or you can't run these containers locally, you can still use the public API for whisper and Text to Speech, see below.
STT
Very simple setup, no deployment or stateful set,
apiVersion: v1
kind: Pod
metadata:
name: whisper
namespace: nginx
labels:
app: whisper
spec:
containers:
- image: onerahmet/openai-whisper-asr-webservice:latest
name: whisper
env:
- name: ASR_MODEL_PATH
value: /data/whisper
- name: ASR_ENGINE
value: faster_whisper
- name: ASR_MODEL
value: tiny
ports:
- containerPort: 9000
volumeMounts:
- mountPath: /data/whisper
name: volume
volumes:
- name: volume
hostPath:
path: /data/whisper
---
apiVersion: v1
kind: Service
metadata:
name: whisper
namespace: nginx
spec:
selector:
app: whisper
ports:
- port: 9000
targetPort: 9000
You
can customize the size of the model, by changing the parameter ASR_MODEL
I used tiny, good enough and requires just few seconds. Larger model can require more than 10s, too slow for a practical use. Whisper supports multiple languages, and autodetect, however I provide statically the language on the API call to improve the speed.More documentation here: https://ahmetoner.com/whisper-asr-webservice/
Traefik for the ingress part.
kind: Middleware
metadata:
name: whisper
namespace: nginx
spec:
stripPrefix:
prefixes:
- "/whisper"
forceSlash: false
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: whisper
namespace: nginx
spec:
routes: # [2]
- kind: Rule
match: Host(`nginx.192.168.2.31.nip.io`) && PathPrefix(`/whisper/`) # [3]
priority: 10 # [4]
middlewares: # [5]
- name: whisper # [6]
namespace: nginx # [7]
services: # [8]
- kind: Service
name: whisper
namespace: nginx
passHostHeader: false
port: 9000
Pay attention on the Host parameter, change to match your network.
TTS
I use MaryTTS, good enough and easy to use.
apiVersion: v1
kind: Pod
metadata:
name: marytts
namespace: nginx
labels:
app: marytts
spec:
containers:
- image: synesthesiam/marytts:5.2
name: marytts
ports:
- containerPort: 59125
---
apiVersion: v1
kind: Service
metadata:
name: marytts
namespace: nginx
spec:
selector:
app: marytts
ports:
- port: 9000
targetPort: 59125
and Traefik for the ingress
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: marytts
namespace: nginx
spec:
stripPrefix:
prefixes:
- "/marytts"
forceSlash: false
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: marytts
namespace: nginx
spec:
routes: # [2]
- kind: Rule
match: Host(`nginx.192.168.2.31.nip.io`) && PathPrefix(`/marytts/`) # [3]
priority: 10 # [4]
middlewares: # [5]
- name: marytts # [6]
namespace: nginx # [7]
services: # [8]
- kind: Service
name: marytts
namespace: nginx
passHostHeader: false
port: 9000
As before, adapt the host to match your network
Comments