Motivation
In this project, we built an application with TensorFlow Lite to recognize "Stop" and "Go". The project is inspired and retrained the TensorFlow Lite micro speech project ( Wake-word detection), which is able to detect 'yes' or 'no' from voice commands and the model is robust against unknown words and background noise. Such ideas provide a voice user interface (UI) designed to give instant access to information without the need for a screen or keyboard, which can be an important module of digital assistants like Google Assistant, Apple’s Siri, and Amazon Alexa. Given the wide applications of wake-word detection and handy tinyML devices available, we decided to retrain the model and apply it to our scenario of interest. Our motivation will be illustrated below.
While 'yes' and 'no' commands can be helpful for digital assistants, they limit users to answer binary questions, which might not be straightforward enough for some application scenarios like self-driving cars. For example, if you want to stop the car with voice commands, instead of waiting for the vehicle to ask 'Do you want to stop the car now?' and answer 'yes', we can simply say 'stop'. This is where our 'go'/'stop' idea comes from.
Specifically, our model is trained to recognize the words “go” and “stop, ” and is also capable of distinguishing between unknown words and silence or background noise. Once trained, the device will listen to its surroundings with a microphone and indicate when it has detected 'go' by lighting a 'green' LED, and 'stop' with a 'red' LED.
All the above is accomplished by processing the captured audio. Then the result is fed to the TensorFlow model. Next, we will introduce the training pipelines.
Model
Training
We chose the Arduino Nano 33 BLE as our hardware platform and trained the model with the TensorFlow simple audio recognition script. The script would download the dataset, trains, and output a model file.
- ConfigurationBefore the training, we need to install dependencies that are necessary for running the script. A specific TensorFlow package is required.
%tensorflow_version 1.x
import tensorflow as tf
First, we specify the keywords we aim to detect. Then training script can be configured with a brunch of command-line. The words to be classified for the training model can also be controlled by the command-line. The scripts' command-line flags are explained as follows.
os.environ["WANTED_WORDS"] = "go, stop"
The "go" and "stop" are included in the dataset. If the word is not included in the dataset, then it would be classified as an"unknown" category. It is possible to choose more than two words in a comma-separated list.
os.environ["TRAINING_STEPS"]="15000,3000"
os.environ["LEARNING_RATE"]="0.001,0.0001"
The model's weights and biases are adjusted over the training iteration. As the training step increase, the model would get close to the desired value. But in order to avoid overfitting, we desire a moderate training step number. The learning rate can be set with different values to tune the parameter adjustment rate. A high learning rate would make the parameter converge faster. But it would also lead to an overfitting model.
-TrainingThe following script downloads the dataset and begins training.
!python tensorflow/tensorflow/examples/speech_commands/train.py \
--data_dir={DATASET_DIR} \
--wanted_words={WANTED_WORDS} \
--silence_percentage={SILENT_PERCENTAGE} \
--unknown_percentage={UNKNOWN_PERCENTAGE} \
--preprocess={PREPROCESS} \
--window_stride={WINDOW_STRIDE} \
--model_architecture={MODEL_ARCHITECTURE} \
--how_many_training_steps={TRAINING_STEPS} \
--learning_rate={LEARNING_RATE} \
--train_dir={TRAIN_DIR} \
--summaries_dir={LOGS_DIR} \
--verbosity={VERBOSITY} \
--eval_step_interval={EVAL_STEP_INTERVAL} \
--save_step_interval={SAVE_STEP_INTERVAL}
log output:
W1208 04:03:38.352888 139879410399104 train.py:322] Final test accuracy = 88.4% (N=1221)
- visualize the accuracy and lossWe load TensorBoard to visualize the accuracy and loss as training proceeds.
%load_ext tensorboard
%tensorboard --logdir {LOGS_DIR}
TensorBoard can show the inputs being fed into the model. Click the IMAGES tab, which displays:
Combine relevant training results (graph, weights, etc) into a single file for inference. This process is known as freezing a model and the resulting model is known as a frozen model/graph, as it cannot be further re-trained after this process.
!rm -rf {SAVED_MODEL}
!python tensorflow/tensorflow/examples/speech_commands/freeze.py \
--wanted_words=$WANTED_WORDS \
--window_stride_ms=$WINDOW_STRIDE \
--preprocess=$PREPROCESS \
--model_architecture=$MODEL_ARCHITECTURE \
--start_checkpoint=$TRAIN_DIR$MODEL_ARCHITECTURE'.ckpt-'{TOTAL_STEPS} \
--save_format=saved_model \
--output_file={SAVED_MODEL}
- Generate a TensorFlow Lite ModelConvert the frozen graph into a TensorFlow Lite model, which is fully quantized for use with embedded devices.
Details refer to https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech/train
- Testing accuracy
Float model accuracy is 87.796888% (Number of test samples=1221) Quantized model accuracy is 87.796888% (Number of test samples=1221)
- Generate a TensorFlow Lite for Arduino# Install xxd if it is not available
!apt-get update && apt-get -qq install xxd
# Convert to a C source file
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
# Update variable names
REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}
-Deploy to ArduinoThe new kCategoryCount and kCategoryLabels will be:
```
const char* kCategoryLabels[kCategoryCount] = {
"silence",
"unknown",
"go",
"stop",
};
```
Updating command_responder.cc
change from
if (found_command[0] == 'y') {
last_command_time = current_time;
digitalWrite(LEDG, LOW); // Green for yes
}
to
if (found_command[0] == 'g') {
last_command_time = current_time;
digitalWrite(LEDG, LOW); // Green for go
}
and from
if (found_command[0] == 'n') {
last_command_time = current_time;
digitalWrite(LEDR, LOW); // Red for no
}
to
if (found_command[0] == 'n') {
last_command_time = current_time;
digitalWrite(LEDR, LOW); // Red for no
}
Loading to Arduino
Demonstration Video
The final demo is shown as follows. We also recorded the monitor window.
For comparison, we add the word "paper" in the video. The word "paper" is recognized as an "unknown" category.
Comments
Please log in or sign up to comment.