In a previous project, I demonstrated how we can run object detection and tracking using Python. In the project, I used a YOLOv5 model, the Edge Impulse Linux Python SDK and custom software that tracks objects using a webcam. The project was run on my personal computer. However, the Linux Python SDK allows you to also run machine learning models and collect sensor data on Linux machines (PCs, single-board computers such as the Raspberry Pi, etc.) using Python.
However, this got me wondering about how we can put this into low-cost, resource-constrained, low-power devices like microcontrollers. How does a personal computer or Raspberry Pi compare to MCUs such as Arduino and ESP? The former are more expensive, consume more power, have less autonomy, and have more powerful CPU, RAM, and flash (ROM) storage.
Recent advances in microcontrollers and Machine Learning models have allowed us to upload lightweight ML models to MCUs and achieve remarkable performance. What if we could implement object detection and tracking on a microcontroller board like the ESP-EYE or ESP32-CAM? The main challenge this time was to implement the Python centroid-tracker algorithm in C++.
IntroductionThis follow-up project demonstrates how to implement object detection and tracking the found objects using microcontroller boards such as the ESP-EYE. I chose the ESP-EYE board because of it's low-cost, low power consumption, onboard 2MP camera, Wi-Fi capabilities, 8 Mbyte PSRAM, small form factor, and I already had this board from a previous TinyML project :)
Some of the ESP-EYE specifications are:
- ESP32 dual core Tensilica LX6 processor with Wi-Fi and Bluetooth
- 8MB PSRAM
- 4MB Flash
- Onboard 2MP OV2640 Camera
- Onboard MEMS Microphone
Here is a link to the public Edge Impulse project : Object Detection - nuts. The source files for this project are available in this GitHub repository: object_detection_and_tracking-centroid_based_algorithm (in the branch with-cpp_ESP-demo).
The Arduino sketch Object_detection_and_tracking_with_FOMO.ino is the main file and it performs the following processes:
- Captures an image using the ESP's onboard camera
- Processes the image and passes it to the FOMO object detection model that returns bounding boxes
- If bounding boxes are found, then they are passed to a centroid tracker algorithm
- HTTP web server that shows a live camera feed with the tracked objects
The Machine Learning pipeline (Impulse), that is the image input processor, feature extractor and neural network, has been exported as an Arduino library. This Arduino library is generated by the Edge Impulse platform and it packages all of your signal processing blocks, configuration and learning blocks up into a single package. This also allows us to also include the C++ FOMO model in an Arduino sketch. This Arduino library is included in the repository with the file name ei-object-detection--nuts-arduino-1.0.50.zip. To add, you can also easily clone the public Edge Impulse project, reconfigure it, and deploy a different Arduino library :)
This tutorial has been tested with the ESP-EYE development board and it should also work with the ESP32-CAM AI Thinker. Please feel free to test it with other ESP SOCs, modules and DevKits (consider PSRAM availability as it affects the performance).
Centroid-tracker algorithm, in C++From the previous project, I had an understanding of how the centroid tracking algorithm works, as described in the Introduction section. However, the code was written in Python. Yes, we can run MicroPython on the ESP board but after investigations and trials, I saw that it is very difficult to install the NumPy and Scipy "MicroPython based" libraries but moreover the greatest challenge was that, simply stated, the amount of data being generated from the calculations could not "fit" on the ESP board. The ESP32 board was giving me "MemoryError: memory allocation failed, " errors after attempts of trying to do some simple test data processing.
To simplify the problem, I decided to re-write the Python centroid tracker code in C++. The C++ code implements the same processes as the Python code.
Conveniently, the C++ approach is also a better solution since it can be easily integrated into projects that target MCUs. Just one C++ file is now needed to add object tracking to your TinyML MCU project!
I choose to use this open-source Edge Impulse FOMO object detecting project. The project trains an ML model that can detect and localize nuts, bolts and washers in an image. I selected this project since I was interested in a use case that would perform well with images of lower quality, such as 48x48 or 96x96 pixels, because the amount of processed data will be less and therefore the latency will be lower. The ESP-EYE board is resource constrained and we are limited with the amount of data that can be processed. Reducing the image size is one way of optimizing the ML model. In a previous project, I found out that a FOMO model inference on the ESP-EYE has a latency of ~850ms (1 frame per second) with a 96x96 image, while ~200ms (5 frames per second) with a 48x48 image.
First, I removed all the images for the bolts and washers classes. The resulting project only has 141 labelled nuts images for both training and testing.
For the Impulse Design, I settled with an image width and height of 96x96 pixels; and resize mode to Squash. Processing block is set to “Image” and the Learning block is “Object Detection(images)”.
The next step was to use the processing block (Image) to generate features from the training dataset. Since I wanted to use FOMO, I set the color depth to grayscale.
The last step was to train the model. For the Neural Network architecture, I settled with FOMO (Faster Objects, More Objects) MobileNetV2 0.1, 200 training cycles (epochs) and a learning rate of 0.0015.
After training, the model has an F1 score of 98%. F1-score combines precision and recall into a single metric. I chose this as an acceptable performance and proceeded to test the model with the unseen (test) data.
When training the model, I used 81% of the data in the dataset. The remaining 19% is used to test the accuracy of the model in classifying unseen data. We need to verify that our model has not overfit by testing it on new data. If a model performs poorly, then it means that it overfit (crammed the dataset). This can be resolved by adding more dataset and/or reconfiguring the processing and learning blocks, and even adding Data Augmentation. Increasing performance tricks can be found in this guide.To test the model, we click "Model testing" and then "Classify all". In my tests, the model had an accuracy of 82%. Note that this is not an assurance that the model is perfect. The dataset was small and the model will not perform well in other situations, say with background objects or in a differently lit environment. We can fix this by adding more data in the project.
To deploy the model to the ESP-EYE, first go to the “Deployment” section. Next, click on the “Search deployment options” field and select Espressif ESP-EYE (ESP32).
To increase performance on the board, I selected Quantized (int8) under EON compiler. This makes the model use 235.5K of RAM and 64.3K of flash on the ESP-EYE board.
Click “Build” and the firmware will be downloaded after the build ends.Connect an ESP-EYE board to your computer, extract the downloaded firmware and run the appropriate script (based on your computer's Operating System) to upload it to your board. Great! Now we have a FOMO model running on the ESP-EYE, locally.
To get a live feed of the camera and classification, run the command:
edge-impulse-run-impulse --debug
Next enter the provided URL in a browser and you will see live feeds and detected objects by the ESP-EYE board.
TinyML Object detection and tracking on an MCU👀- To add object tracking, first clone/download the GitHub repository object_detection_and_tracking-centroid_based_algorithm (in the branch with-cpp_ESP-demo).
- Next, add the ei-object-detection--nuts-arduino-1.0.50.zip Arduino library to the Arduino IDE. On the IDE, you can go to Sketch > Include Library > Add.ZIP library...
- If needed, install the esp32 (by Espressif Systems) board in the IDE using the Boards Manager. In my case, I installed the version 2.0.14 and it worked well.
- If using the ESP-EYE board, select the M5-Stack-Timer-CAM board under Tools. It should be in the ESP32 Arduino list. Use the default settings for the board configuration.
- Open the Object_detection_and_tracking_with_FOMO.ino file and put your Wi-Fi SSID (Wi-Fi network name) and password in the variables ssid and password respectively. If you are not using the ESP-EYE board, make sure to uncomment the board that you are using among the CAMERA_MODEL_ variables (and also comment the CAMERA_MODEL_ESP_EYE to prevent conflicts).
- Connect your ESP board to the computer and upload the code.
The ESP board will then connect to the Wi-Fi network and print its IP address on Serial. Finally, use a web browser to open the IP address web page. You will then see a web page and live feed coming from the ESP board. When a nut is detected, its centroids will be shown by a red circle and it will also be tracked and assigned an ID.
I then mounted the ESP-EYE board on a support and placed nuts below it.
On checking the camera feed, we can see the software accurately detecting the nuts and tracking them with their IDs. This is impressive!
On the ESP-EYE, the Digital Signal Processing (DSP) time is 8ms and the FOMO model's inference classification time (latency) is around 558ms which gives around 1.8 fps (frames per second).
Remember that this Machine Learning object detection, object tracking, and serving the HTML web page is done locally on the ESP board. It's so fascinating how low-power, resource constrained and low-cost devices have advanced in the recent years.
Some hacks that were needed 🔧The default Edge Impulse ESP Arduino library for object detection models only prints the inference results on Serial. You can however deploy the Impulse as an Edge Impulse firmware and use the Edge Impulse CLI to run the firmware with live camera feed. This last alternative will however not allow you to add custom code to the firmware.
In this case, the Object_detection_and_tracking_with_FOMO.ino Arduino code streams the current image captured by the ESP board. This does not change the type of image data, the code only streams the image before it is processed. The image is RGB but for FOMO, we use grayscale images.
Afterwards, the JavaScript code in the served HTML page (defined in index_html) uses the bounding boxes in JSON data to draw circles around the calculated nut's centroid.
Tracking different objects 🚀Changing the tracked object is very simple and I would highly recommend training and deploying the Machine Learning model using Edge Impulse. To do this, please follow the following steps:
- Train a Machine Learning model on Edge Impulse Studio and deploy it as an Arduino library. This puts your model in a C++ file.
- Add the deployed Arduino library to the IDE.
- In Object_detection_and_tracking_with_FOMO.ino, replace
#include <Object_Detection_-_nuts_inferencing.h>
with the deployed library name - In utils.cpp, replace
nut
in the JavaScript variableobjects_name
with the name of the object that you are tracking. This name will be displayed on the web page. - Currently, if an object is not located in the next 5 subsequent frames (images), the centroid tracker algorithm marks it as disappeared and it is deregistered. If you want to adjust this limitation, update the
maxDisappeared
CentroidTracker variable in centroid_tracker.h.
Feel free to integrate this centroid tracker C++ file with other MCU development boards. Many MCU boards that can run C++ are currently supported by Edge Impulse, including those with AI accelerators!
Comments
Please log in or sign up to comment.