Cynthia Chen (yc90)
Yikun Li (yl212)
Our Goal:We implemented an embedded voice recognition application using tiny embedded device which can take one-second voice as input and classify voices. Different kinds of voices are indicated by different colors of LED lights. We trained a wake word detection ML model and implemented it on Arduino Nano 33 BLE.
The Machine Learning Model:In order to process the raw audio data, the feature provider of the embedded device converts raw audio data into the spectrograms, which are two-dimensional arrays that are made up of slices of frequency information. The features of the 2D tensor can be perfectly extracted by the convolutional neural networks (CNNs).
The CNN model was pre-trained on a dataset called the Speech Commands Dataset. This consists of 65, 000 one-second long utterances of 30 short words, crowdsourced online.
Our Approach:1. Input Data Collection
This component captures raw audio data from the microphone. In order to verify the functionality of the device in different situations, the audio of all the group members is used as input.
2. Set up TF Lite interpreter
Import the micro_speech example in Arduino IDE. The C++ code is generated and the TensorFlow Lite environment is set up automatically. When doing the testing, the interpreter runs the TensorFlow Lite model, transforms the input spectrogram into a set of probabilities, and picks up the one with the highest probability as output.
3. Program the Board
Connect the board to the laptop and upload the code to Arduino board.
4. Test the Functionality
Use the LED on the device to classify different audio types (Green: Yes, Red: No). If a command was heard, the command responder uses the device’s output capabilities to let us know. Then we test with 'Yes' and 'No' for several times and light on the board shows it works. As the model is still kind of naive, testing is repeated several times to ensure it works well.
The test case of 'Yes':
The test case of 'No':
Comments
Please log in or sign up to comment.