When I think about digital assistants, I always have an image of super complicated and heavy data-oriented neural networks behind the scene. However, when I learned that we can train a tiny model that listens for a wake word and run it on a small and low-powered chip, I felt fascinated by it and that's why I decided to build my own wake-word detection project following the guide from the TinyML book by Pete Warden and Daniel Situnayake.
ObjectiveThe goal is to build an embedded application that uses a small machine learning model: around 18KB, trained on a dataset of speech commands, to classify audio. For simplicity purposes, I followed the book and built a simple model that trained to only recognize the word "yes" and "no", and be able to identify if it actually heard an unknown word.
Application Architecture- Obtain audio data as an input for model training.
- The raw audio data requires heavy preprocessing so that I can extract suitable and important features to feed into the model.
- Unlike simple classification problems, the model needs to be able to make sense of a stream of inferences, not individual data points.
- Post-process the model output and use the LED light to display prediction results.
The model was trained on a dataset called the Speech Commands dataset. This consists of 65, 000 one-second-long utterances of 30 short words, crowdsourced online.
Although the dataset contains 30 different words, the model was trained to distinguish between only four categories:
- "yes"
- "no"
- "unknown"
- silence
The model takes in one second's worth of data at a time. If outputs four probability scores, one for each of these four classes, predicting how likely it is that the data represented one of them.
- Follow the notebook in Google Colab to train a simple audio recognition model to recognize keywords in speech.
- Download Arduino IDE and install the TensorFlow Lite Arduino library.
- Go to Examples -> TensorFlowLite -> micro_speech to obtain the starter C++ code for deployments.
- Compile code and upload it to the board.
- Test the application by saying "yes", "no" and other words. In addition, open up the serial monitor and serial plotter to help you monitor the results.
The application was able to successfully identify "yes", "no" and other "unknown" words. However, it can be easily disrupted by environmental noise, such as the sound of cars passing by outside of the windows. A future improvement can be done is to implement an additional algorithm to cancel the noise while maintaining the ability to recognize the speech.
Comments
Please log in or sign up to comment.