Architecture
Image Classification model
Training Keyword Spotting model
Setting up ESP-IDF
Building and Deploying our TinyML Application
Interfacing an external Microphone with ESP32
ESP-Now Protocol for controlling devices
Combining the Visual and Audio Perception
Results
Future Scope

•

Sri Sai Tarun

Published September 19, 2021 © GPL3+

Project Geass

This project allows the user to control the device they are looking at, through voice commands.

IntermediateFull instructions provided12 hours617

Story

Inspired by the ShAIdes project, we wanted to build a similar wearable, but with complete onboard processing using tinyML. This project allows the user to control the device they are looking at, through voice commands.

Architecture:

The project consists of 2 main components:

OpenMV camera running the image classification model
ESP-32 running with an external microphone running Keyword spotting model to detect commands and send commands to the intended device via ESP-Now protocol.

Inteface between INMP441 microphone, ESP32 and OpenMV Cam H7 Plus

Whenever a known object is detected (in our case a lamp and a Television), the corresponding pin is set high for 2 seconds on the OpenMV camera. During this period of 2 seconds, if a command is recognized by the keyword spotting model running on the ESP32, the recognized command (in our case either ON or OFF ) is sent to the selected device using the ESP-Now protocol.

Image Classification model:

We have created an image classification model using the Edge Impulse platform. We collected around 80 images for each class. There are 3 classes in total lamp, television and unknown.

1 / 2 • image of lamp from dataset

It is very important to collect pictures of all other appliances in the room/house during the dataset creation and group them under the unknown category.

Model performance on validation set

The MobileNetV2 0.05 model in Edge Impulse was used for transfer learning.

The trained model was then deployed on the OpenMV camera and was able to achieve 4 inferences per second.

Training Keyword Spotting model:

The Keyword spotting model was trained in google colab, to recognized "On" and "Off" keywords. You can easily retrain the model to recognize other keywords present in the speech_commands dataset.

Post-training, the model was converted to a tflite model and then, post-training quantization was done to reduce the size of the model to be deployed on embedded devices.

Save this tflite model to your desktop and run the following command to convert it into a character array. Save this file as we will be using it at a later stage.

xxd -i converted_model.tflite > model_data.cc

It takes around 2 hours for the model training to complete. So we have added the trained model_data.cc file in GitHub.

Setting up ESP-IDF:

Initially, we tried deploying the keyword spotting model to ESP32, using the Arduino TensorFlow lite library for ESP32. But faced some issues with the tflite micro version compatibility issues, which always threw the following error:

Didn't find op for builtin opcode 'CONV_2D' version '2'

This error was only observed after post-training quantization. So unable to deploy using the Arduino IDE, we moved on to the official Espressif tool ESP-IDF.

You can easily install the ESP-IDF extension in Visual Studio Code, using this tutorial. Once the development environment is set up, we move onto building our tinyML application.

Building and Deploying our TinyML Application:

Since we will be building our application on the TensorFlow micro_speech example, we need to first clone the TensorFlow repository to our local machine.

git clone https://github.com/tensorflow/tflite-micro.git

Move to the cloned tflite-micro folder and run the following command generating the micro speech example project for ESP32.

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=esp generate_micro_speech_esp_project

Now we can open the project in VSCode from the following location

tensorflow/lite/micro/tools/make/gen/esp_xtensa-esp32_default/prj/micro_speech

Now copy the character array from the "model_data.cc" file into the "model.cc" file in the project.

Next, in the micro_model_settings.cc file change the labels to the one you have trained the model with.

micro_model_settings file

And finally, change the value of kCategoryCount in micro_model_settings.h file to the total number of labels.

This basic application just prints out the keyword which is detected to the serial monitor. But if we want to take some action based on the detected keywords, we need to modify the command_responder files.

We have modified the files to send commands to the selected device using Espressif's ESP-Now protocol. You can check out the code in the GitHub repo link attached at the end.

That's it! We can now we can build and flash our firmware into ESP32. This video shows how you can build, flash and monitor using the ESP-IDF VSCode extension.

Interfacing an external Microphone with ESP32:

After seeing the performance of different external microphones with ESP32 in one Blog post, we decide to use the INMP441 microphone. If you would like to use a different microphone, then change the audio_provider files in the project accordingly. You can find the details about interfacing the microphone with ESP32 at the end.

ESP-Now Protocol for controlling devices:

ESP-NOW is a protocol developed by Espressif, which enables multiple devices to communicate with one another without using Wi-Fi. Its power-efficient and convenient to deploy.

We will be sending the signal from the ESP32 which is running the keyword spotting model, to the ESP32 which is connected to the device we want to control. Here we have connected the receiver ESP32 with a relay module, to control the lamp.

Combining the Visual and Audio Perception:

When an object is detected on the OpenMV camera, it sets the corresponding pin high. When a keyword is detected on the ESP32, it reads the value of the pins which are connected to the camera, to determine the selected device and then sends the command to the device.

In our case, Pin P0 is set high when a lamp is detected and Pin P1 is set high when the television is detected by the camera. Simultaneously if a keyword is detected on ESP32, it checks the value of Pins to determine whether the commands need to be sent to the Lamp or the TV or neither of them.

Results:

Controlling the Lamp with voice commands

When the ESP-EYE module is pointed towards the lamp, and the "On" command is heard, a signal is sent via ESP-Now protocol from the ESP-EYE module to the ESP32 connected with the relay module which switches on the lamp. This demonstrates that our entire pipeline is working perfectly!

Future Scope:

Currently, the system used for building the proof of concept is not compact enough to be used as a wearable. So we are currently working on porting this project onto the compact ESP-EYE development board which includes all the 3 required peripherals a camera, microphone, and wifi module.

To support speech-impaired users, instead of using audio control, we can use a gesture-based control mechanism. Such devices could be useful for cerebral palsy patients, by helping them control home appliances with ease and give them some independence in life.

Code

Credits

Sai Charan Kovuru

9 projects • 25 followers

Contact

Sri Sai Tarun

7 projects • 18 followers

Architect with an interest in electronics and coding.

Contact

Comments

Please log in or sign up to comment.

Project Geass

Story

Architecture:

Image Classification model:

Training Keyword Spotting model:

Setting up ESP-IDF:

Building and Deploying our TinyML Application:

Interfacing an external Microphone with ESP32:

ESP-Now Protocol for controlling devices:

Combining the Visual and Audio Perception:

Results:

Future Scope:

Schematics

Interfacing ESP32 with INMP441 and OpenMV

Code

Github

Credits

Sai Charan Kovuru

Sri Sai Tarun

Comments

Embed the widget on your own site

Project Geass

Project Geass

Story

Architecture:

Image Classification model:

Training Keyword Spotting model:

Setting up ESP-IDF:

Building and Deploying our TinyML Application:

Interfacing an external Microphone with ESP32:

ESP-Now Protocol for controlling devices:

Combining the Visual and Audio Perception:

Results:

Future Scope:

Schematics

Interfacing ESP32 with INMP441 and OpenMV

Code

Github

Credits

Sai Charan Kovuru

Sri Sai Tarun

Comments

Related channels and tags