In this project we show you in a few simple steps how to get your ESP-EYE to run image recognition to recognize hand gestures for rock, paper and scissors. Follow along until the end to learn how to train the ESP-EYE to recognize your own gestures.
Step One: How to playInstall git if it's not already installed and clone our project repository.
git clone https://github.com/itemis/tflite-esp-rock-paper-scissors
Install Python version >=3.10.
sudo apt install python3 python3-pip python-is-python3
Navigate to the project root folder and install the required Python modules.
cd path/to/tflite-esp-rock-paper-scissors
pip install -r requirements.txt
Then, install the required C++ libraries; we need TensorFlow Lite.
chmod +x src/tinyml_deployment/update_components.sh
./src/tinyml_deployment/update_components.sh
We also need to install ESP-IDF version 4.4.2. First install the packages required for ESP-IDF. Then create a directory for ESP versions and clone their repository into the ESP folder.
sudo apt install git wget flex bison gperf python3 python3-venv cmake ninja-build ccache libffi-dev libssl-dev dfu-util libusb-1.0-0
mkdir -p ~/esp
cd ~/esp
git clone -b v4.4.2 --recursive https://github.com/espressif/esp-idf.git esp-idf-v4.4.2
After the download, you need to run their installation script.
cd esp-idf-v4.4.2/
./install.sh all
Then you should add your user to the dialout group for write access to the serial port over USB. You will need to log in again for the change to take effect.
sudo usermod -a -G dialout $USER
Add ESP-IDF to your path to be able to conveniently call it from anywhere.
cat >> .bashrc << EOL
alias get_idf='. $HOME/esp/esp-idf-v4.4.2/export.sh'
EOL
Start a new terminal for the change to take effect. Now navigate to the rock, paper, scissors repo. It's time to build and flash the code.
cd src/tinyml_deployment/
get_idf
idf.py build
idf.py -p /dev/ttyUSB0 flash monitor
The port name /dev/ttyUSB0 may vary on your system. You can list active USB serial ports on your system with the following command.
ls /dev/ttyUSB*
Check out the serial monitor connected to the ESP. The following output will appear, then you should show your hand around 30 cm in front of the ESP. Position the ESP so that its micro USB port points up when the camera is facing your hand.
################## first round! #################
3!
2!
1!
Show your hand!
After a short moment, the ESP will make its move.
AI plays: scissors!
To play the next round, fully cover the camera with your hand.
Step Two: Howit works
In this section, we want to give you a brief summary about how the MCU is capable of running an AI.
The first step to run the AI on the MCU is to convert the existing TensorFlow model to a TensorFlow Lite model, which can be interpreted by the MCU. We run a TensorFlow Lite interpreter on the ESP-EYE that expects a trained model in the form of a C-array, which looks somewhat like this:
const unsigned int model_weights_len = 11064;
const unsigned char model_weights[] DATA_ALIGN_ATTRIBUTE = {
0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, 0x00, 0x00, 0x00, 0x00,
0x14, 0x00, 0x20, 0x00, 0x1c, 0x00, 0x18, 0x00, 0x14, 0x00, 0x10, 0x00,
0x0c, 0x00, 0x00, 0x00, 0x08, 0x00, 0x04, 0x00, 0x14, 0x00, 0x00, 0x00,
...
}
The model is loaded into the interpreter by one line of code:
model = tflite::GetModel(model_weights);
After initializing the model, we can send images to the TensorFlow Lite interpreter. We do this by copying an array representing an image to the model_input array that is managed by the interpreter.
std::copy(img_float.begin(), img_float.end(), model_input->data.f);
We use a super loop to continuously take images and pass them to the model. After taking an image, the interpreter's invoke function is called to present the input to the model. The input propagates across the neural network, this is known as forward pass. The result is an activation level of three variables, with one representing rock, paper and scissors, respectively. Each activation is a measure of how likely the network thinks each class is.
We've organized the loop into steps of reading data and interpreting the model's output.
void loop() {
feature_provider.SetInputData(data_provider.Read());
feature_provider.ExtractFeatures();
feature_provider.WriteDataToModel(model_input);
TfLiteStatus invoke_status = interpreter->Invoke();
if (invoke_status != kTfLiteOk) {
error_reporter->Report("Invoke failed");
return;
}
auto prediction = prediction_interpreter.GetResult(model_output);
prediction_handler.Update(prediction);
vTaskDelay(0.5 * pdSECOND);
}
The class probabilities produced by the network are written to an array that we can access as follows.
float paper = model_output->data.f[0];
float rock = model_output->data.f[1];
float scissors = model_output->data.f[2];
The largest probability will be the ESP-EYE's prediction!
Step Three: Prepare your own data and train your own modelFor example, extend the model to recognize lizard and spock gestures.
When you gather images, the more, the better. To collect images efficiently, start your webcam and start filming your hand doing variations of the lizard gesture.
Install ffmpeg to convert the video to images.
ffmpeg -i input.mp4 -vf fps=30 out%d.png
Repeat this for the spock gesture, and keep the images apart.
Next, navigate to the project root and create directories for your added classes.
cd path/to/project
mkdir -p data/raw_images/spock
mkdir -p data/raw_images/lizard
Copy, the images sliced from the video to the created folders.
cp path/to/spock_images path/to/project/data/raw_images/spock
cp path/to/lizard_images path/to/project/data/raw_images/lizard
Convert the images from large RGB images to 96x96 grayscale images.
python src/data_preprocessing/preprocessing_pipeline.py
The following conversion steps will be taken.
The script will also create folders called "train" and "test" in the data folder, which are required to train the model. If you don’t want to collect data now, you can download our ready-made dataset here.
Since the data is ready, the model can be trained. Navigate to the project root and make the necessary scripts executable.
chmod +x src/model_pipeline.sh
chmod +x src/tf_lite_model/tflite_to_c_array.sh
Run the following script to convert the model from TensorFlow to TensorFlow Lite format. The resulting C-array is automatically copied to the C++ code.
./src/model_pipeline.sh
Below, you see the output produced during training – the higher the accuracy (train set) and validation accuracy (test set), the better!
Epoch 1/6480/480 [==============================] - 17s 34ms/step - loss: 0.4738 - accuracy: 0.6579 - val_loss: 0.3744 - val_accuracy: 0.8718
Epoch 2/6216/480 [============>.................] - ETA: 7s - loss: 0.2753 - accuracy: 0.8436
As a last step, you will need to modify the C++ code to integrate lizard and spock into the event loop. Now we are ready to build and flash our model using the same ESP-IDF commands shown in step two. Good luck!
FinIf you liked our rock, paper, scissors tutorial and want to delve deeper, you may also enjoy reading our explanation focused article on the same topic. If you prefer working with Jupyter notebooks, we've also got you covered with a Jupyter based project template.
Comments
Please log in or sign up to comment.