training a tinyML Model which could listen for wake words and embedded this voice recognition application on a low-power chip—-Arduino Nano 33 BLE Sense microcontroller. It takes distinct kinds of voices as input, listens for magic word/wake word, then it will notify the chip to show signals(LED light). In the real words, as with applications like Google Assistant, Apple’s Siri, this tinyML model can be further developed to detect the wake word and notify the phone’s operating system to respond.
The Machine Learning Model:In order to process the raw audio data, the feature provider of the embedded device converts raw audio data into spectrograms, which are two-dimensional arrays that are made up of slices of frequency information. The features of the 2D tensor can be perfectly extracted by the convolutional neural networks (CNNs).
The CNN model was pre-trained on a dataset called the Speech Commands Dataset. This consists of 65, 000 one-second long utterances of 30 short words, crowdsourced online. Although the dataset contains 30 different words, the model was trained to divide the input (which is processed 2D tensor generated from raw audio data) into 4 possible situations:
- word “yes”
- word “no”
- “unknown” words
- noise/silence.
It will output four probability scores for each of these four classes, predicting how likely it is that the data represented one of them.
Our Approach:1. Obtain input and preprocess it to a2D tensorCaptures raw audio data from the microphone. The model takes in one second’s worth of data at a time, analyzes the feature the input has and subtracts it to a viewable 2D tensor.
The visualization of features of 'Yes' and 'No'.
Set up the interpreter for Tensorflow Lite models. The C++ code is generated and the TensorFlow Lite environment is set up automatically. The interpreter runs the TensorFlow Lite model, transforms the input spectrogram into a set of probabilities, and picks up the one with the highest probability as output.
3 Post Processes the model’s output to make sense of itAfter our model outputs a set of probabilities that a known word, using the Recognize Commands class to determine whether this indicates a successful detection. In addition, Utilizing the Command Responder to produce an output to let us know what word was detected.
4. Uses the resulting information to make things happenDeploy to the microcontroller so that the LED lights respond to different colors when they detect different words.
Results:The application successfully responds to a specific sequence of “yes” and “no” and uses a command to control LED lights to display different situations. However, it seems that the application is sensitive about the noise in the background which will slightly influence the accuracy of the result. This application can be further improved by removing noise by using technologies like LMS Adaptive Filter (will try to achieve this in the future).
The expected effects are listed below:
Comments
Please log in or sign up to comment.