Published February 15, 2021

Controlling Traffic Lights with Micro Speech

A TensorFlow Lite Micro Speech model that detects wake words and turns on a different coloured LED light to emulate traffic lights.

BeginnerFull instructions provided5 hours4,163

Controlling Traffic Lights with Micro Speech

Things used in this project

Hardware components

Arduino Nano 33 BLE Sense

USB-A to Mini-USB Cable

Software apps and online services

Microsoft Visual Studio Code Extension for Arduino

PlatformIO IDE

Google Colab Notebook

TensorFlow

Arduino IDE

Hand tools and fabrication machines

TensorFlow Lite

Story

Introduction and Motivation

Machine learning typically involves lots of computing power, and these are usually in the form of a large data center with GPUs and the costs of training a deep neural network can be astronomical. The emergence of tiny neural networks, which are as small as 14 KB, opens a plethora of doors to new applications that can analyze data right on the microprocessor itself and derive actionable insights (Warden and Situnayake, 2019). This saves time and prevents latency because we do not have to transmit data to a cloud data center for it to be processed and wait for it to come back (Warden and Situnayake, 2019). Such a phenomenon is called Edge Computing and allows for data to be processed and computed on the device that it is stored (Lea, 2020).

Learning Process: The Model Training

For starters, I had no idea what edge computing or what an Arduino was before I started this project. As the list of technologies demonstrate I had to work with and orchestrate an entire ecosystem of tools to achieve my goal of deploying a speech recognition on an Arduino board that worked fairly well.

The first thing I did was download the VS Code IDE and ensured that the Platform IO extension was installed. In tandem, I had to download the Arduino IDE and include the TensorFlow Lite library. In VS Code, I imported from the Arduino IDE the built-in micro speech example to serve as an example that worked well. This model received inputs of “yes”, which turned on the green LED on the Arduino; “no” which turned on the red LED on the Arduino; all other words which turned on the blue LED on the Arduino; or silence which didn’t turn on the LED.

The training script was provided by TensorFlow in its GitHub repository already: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb.

This script served as a base too for how I wanted to train the micro speech model for my specific use case.

I initially started to convert the training script from TensorFlow version 1.x to 2.x, because of performance optimizations and simplified API calls in the latter. However, I was not able to convert the entire script because in TensorFlow v2 there is no equivalent to v1’s “tf.lite.constants.INT8” module.

Tinkering around with this script was actually a huge challenge. The main factor was due to the fact that whenever I made changes to the model, it took around 2 hours to train, so there was no fast and immediate feedback on the hyperparameters I chose. I could not perform “rapid prototyping” by experimenting often and manipulating many parameters. The most maddening problem I ran into was the fact that after I had trained the model in TensorFlow, when I tried to quantize the model and generate a TensorFlow Lite model from it, I would get an error message like this:

I did a lot of research on this and it turns out that this is most likely to be attributed to me creating a nested graph every time I ran the snippet of code in the Colab cell, so new place holders do not get their needed input values during the evaluation phase. I resolved this issue by restarting my GPU every time I re-ran my training script to train the micro speech model.

The initial accuracy I got with my model was 87.2% by training it with “go” and “stop, ” so I thought I could improve it by increasing the amount of training steps from 15, 000 iterations to 25, 000. I also foolishly tweeked another variable at the same time which was the amount of wake words the model would recognize, as I added “backward” and “forward” to the list.

My accuracy for the model trained with 25, 000 steps actually decreased to 85%. It took almost double the time, 3.5 hours, but the accuracy decreased! There could be two factors that caused this: 1) the confusion matrix got larger and had to recognize and be more sensitive to a larger vector of words, so the accuracy decreased; 2) The peak performance was actually at 15, 000 training steps and the model entered a phase of diminishing return.

I went back to the original 15, 000 training steps with the two additional wake words and the final accuracy for my model was 85.8%. So from this result I can make a conclusion that most likely to reach the peak performance it needs to be trained with 15, 000 steps.

Final model trained on 15,000 steps

Model trained with 15,000 iterations.

Model trained with 25,000 iterations.

From my previous assignments in the class, the learning rates of 0.001 and 0.0001 produced the models with the highest accuracies so I used those to train my model.

The other adjustments I made was the actual architecture of the model. There were several to choose from, which included a full-sized convolutional neural network. I initially selected that MODEL_ARCHITECTURE = ‘conv’ and tried to deploy that model to my Arduino. The result was the model was too large for the Arduino to handle, so it shut down by throwing a stack of “Request failed” errors whenever I tried to speak the wake words in the microphone. I learned after this that the only model architectures that the Arduino would handle are the ones that have the word “tiny” in them, such as “tiny_conv” which worked the best, and “tiny_embedding_conv.” These model architectures were already pre-packaged in the training script and ready for use.

The “tiny_conv” model has a convolutional layer, then a fully connected layer with 4 x 4000 weights, and a softmax layer as its final layer.

The words I picked to train the model with varied. The model started from recognizing “yes” and “no”, which were distinct enough from one another that they would not be easily mixed up compared to other pairs of words like “on” and “off.” I then decided to add “forward” and “backward” to make this model act like a traffic signal/parking signal. The LED would turn red when “STOP” or “BACKWARD” is uttered and turn green when “GO” or “FORWARD” is spoken. If a word that does not fall into any of these four categories is spoken, the LED would turn blue. Silence would not turn on the LED at all.

The micro speech model essentially transforms the audio input given to the microphone into a spectrogram, and then runs it through TensorFlow Lite to classify which words were spoken. Once the model finishes training, the training script then generates a binary model file called model.cc. The data in this binary file encodes the model that was trained with the script and the binary length of the model into a file called micro_features_model.cpp.

Spectrogram that is being input to the model during training

I used TensorBoard to show the progression of accuracy and cross-entropy, which should be increasing and decreasing respectively. The red lines refer to performance on the validation dataset, which occurs periodically, hence the sparse datapoints. The blue lines refer to performance on the training dataset, which occurs regularly, so the lines are much closer together.

There is a variable called kCategoryCount that denotes the number of words the model needs to classify, in addition to “unknown” and “silence.” For my particular use case, kCategoryCount is equal to 6.

Learning Process: The Deployment to Arduino

So far, I had just discussed the challenges of training the model and getting it to the highest accuracy possible. The next hurdle was deploying this model to the Arduino.

The biggest challenge that plagued me was an error that said I had defined my model twice in my project folder. I spent an enormous chunk of my time combing through all the different files to resolve that error. It turns out that I had forgot to remove an old #include statement which referenced a different model than the one I was trying to deploy to the Arduino. To avoid future confusion, I gave my model a simple name and deleted any other files that was related to another model. I simply did a clean-up of my project folder and settled for minimalism. The error is displayed below.

Error because I had defined two different models in my project folder and had not removed a reference to an old model.

Once I cleared this hurdle, I quickly was quickly greeted with another brick wall. The program was able to compile, but I was not able to upload it to the Arduino. The error I got said there was no upload port detected, so I could not upload the model to the Arduino. I got very anxious because I thought it was my MacBook Pro that was having issues and that it was not able to detect external devices. I tested this theory by plugging in a USB into it to see if it would recognize it, and it in fact did. I was left scratching my head and moved the development to my Windows computer. Everything worked as expected until it was time to deploy my model to the Arduino. I got the exact same error, and then I started getting really worried. Could my Arduino be broken? Was it a defective one that got sent to me? I even moved from Windows 7 to Windows 10 and got the same error! This was getting very worrisome. The final piece to check would be my micro USB cable. I had used my old micro USB that charged my Android phone. This was literally the last variable to change. If switching my micro USB cable still did not allow me to upload the model to the Arduino, there was nothing I could except write about how I was unable to upload the model.

I have shown my compilation success but upload failures in the screenshots below.

Model compiled but failed to upload

Model failing to upload due to not being able to detect upload port

I swapped my micro USB cable for another one that was longer and not meant specifically for charging phones…and this was the amazing result!

Model successfully uploaded to my Arduino!

It turns out the reason why the upload port could not be detected was because of a dodgy and faulty micro USB cable! I could not have been more relieved.

After resolving those two issues while deploying to the Arduino, I was now free to deploy models to it. The port error did come up, but this time I was able to find a quick solution which was recommended by Arduino: “If the board does not enter the upload mode, please do a double press on the reset button before the upload process is initiated; the orange LED should slowly fade in and out to show that the board is waiting for the upload.” Performing this reset action on the Arduino resolved any port detection errors.

More research also recommended to perform a clean-up on your build files before uploading, as demonstrated by this screenshot.

Learning these nifty tricks will catapult me into the next phase of working with Arduinos.

Result of My Model

Now that all these roadblocks were cleared, it was to the fun part. It was time to test my micro speech model to see if it actually recognized the wake words and turn on the respective LED light!

It turns out I can see the results by going to the Arduino Serial Monitor and it will tell me what wake word the Arduino interpreted. Though it was not super accurate, it managed to get all the categories! When I did not say anything the LED did not turn on; “forward” turned on the green LED; “go” turned on the green LED; “stop” turned on the red LED; “backward” turned on the red LED; and saying any other words that were not these four categories like “dog” and “ball” turned on the blue LED. I did have to repeat myself multiple times for “stop” and “dog”, but overall the Arduino could respond to all the wake words I chose. This demonstrates that tiny, compressed neural networks have issues with accuracy, and that is a trade-off we must consider (Warden, 2019).

I have included two videos of my demo, and they did not work as well as they did here because the Arduino was not able to recognize me at all saying “stop!” This is due to the luck of the draw and because these tiny quantized models still struggle with accuracy which leads to wild variation in the results.

Future Work

Being able to deploy tiny deep learning models on the Arduino revolutionizes what can be done for IoT and Edge Computing. The possibilities are endless largely because of significant reductions in cost. There is still a lot of improvement to be done as the accuracy of the model could be better. Improving the accuracy is still somewhat of a black-box activity, especially with such new technologies, so thorough experimentation by manipulating multiple parameters would get us closer to a better model. The audio input to this particular model is very simplistic, and the next logical move would be to train and use full phrases to control the Arduino, whether it is still turning on different colored LED lights, or having it respond in another way. Even though at this stage the implementations on the Arduino are more POCs than scalable or in mass industrial production, it gives us a glimpse into this new cutting-edge field. Using the power of algorithm reduction, we can stack and combine solutions offered by these POCs together to solve harder, time-sensitive ones which pertain to safety such as robotic surgery or self-driving cars.

References

Lea, P. (2020). IoT and Edge Computing for Architects – Second Edition. O’Reilly eBooks [online]. Available at: https://learning.oreilly.com/library/view/iot-and-edge/9781839214806/ (Accessed: 14 December 2020).

Warden, P. and Situnayake, D. (2019). TinyML. O’Reilly eBooks [online]. Available at: https://learning.oreilly.com/library/view/tinyml/9781492052036/ (Accessed: 10 November 2020).

Warden, P. (2017). Launching the Speech Commands Dataset [Online]. Available at: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html (Accessed: 5 December 2020).

Seeing It In Action

Demo of the Micro Speech Model