The introduction of the ESP-CAM has opened up possibilities for adding a camera to existing IoT projects. After exploring embedded ML using Edge Impulse on the RP2040 in my previous project, I was intrigued by the potential for object detection using the ESP32-CAM. Given that the ESP32-S features 520KB of RAM and a CPU frequency of up to 240 MHz, it seems promising that a simple object detection model can be deployed on the ESP-CAM.
In this project, the ESP camera is designed to recognize a human hand and fist on a white background. If a hand is detected, the LED brightens; if a fist is detected, the LED dims.
Steps involvedThe overall process involves certain offline methods and online tools from gathering the data -> training the model -> deploying the model. The initial and final steps is usually done offline while the model training is done by the Edge Impulse.
Dataset can be provided to the Edge Impulse by connecting directly from a smartphone(the easiest) or by selecting files from the local PC.
In this case, I have selected the latter option that captures samples from the ESP-CAM since I will be using the same camera sensor for inference.
Acquisition from ESP CAM
Although it may seem tedious to gather sample images from the ESP-CAM, I created a script that hosts a webpage allowing the user to capture photos. The user can decide if the image is good enough to be saved on the SD card attached to the board. The images can later be retrieved from the SD card and uploaded to Edge Impulse.
Working on this task helped me better understand the file systems and memory architecture of the ESP32-S, and taught me how to transfer files from the flash memory to the SD card.
Steps to take Image Samples
After uploading the sketch, check the local IP from the Serial Monitor once the connection is established. Upon entering the IP on the Web browser, a basic web interface can be obtained for capturing, viewing and saving the image.
Swipe the images to follow the next steps.
Loading Images to EdgeImpulse
This step is fairly straightforward where the files are manually uploaded to Edge Impulse from the local computer.
Step 1: Select the Upload data button to upload files manually
Step 2: Connecting the SD card to the computer and uploading the files. The data acquisition sketch loads the image in the root directory, so uploading as a folder is not possible.
Once the samples are collected, it is necessary to label the images to let the model know what to categorize.
The labels are created by bounding the desired region in the box.
The labels used in this project are "hand" and "fist".
I have collected about 40 samples for each and made sure that the training and test set ratio is 80%. The samples stored in test set is later used for validation after the model is created.
- Creating the Impulse
On the "create impulse" tab of Edge Impulse, I have set the image width and height as small as possible i.e 48 x 48 pixels. The main reason is that less image resolution will need less processing time and faster response. However, the accuracy will be affected.Additionally, I have added the existing image processing block and the "object detection(images)" learning block.
- Extracting Image features
This is the area where features are extracted for each labels. It is necessary to set the color depth to "grayscale" to minimize the processing load on the ESP32.
Upon generating the features, this is the spectrum I obtained. The farther the features are for each labels, the easier is its classification. In this case, some features are closer in the middle which can cause errors in detecting.
- Configuring the Neural Network
This is the workspace where we feed the parameters for the neural network. The training cycles normally refers to the number of epochs to train the neural network. More training cycles can take more time to train the model.
The learning rate defines how fast the neural network adapts. This number should be optimal as faster learning rates can also cause ambiguity in the results.
For the ESP32-CAM, the model is set to FOMO (Faster Objects, More Objects) MobileNetV2 0.1 because it is one of the optimized models that the ESP32 can support.
To check the performance of the model, certain samples are used as a validation set. Among the datasets used to train the model, roughly 20% of the samples were reserved for validation and obtaining the F1 score.
I experimented with these parameters to achieve maximum performance. Upon initially setting the training cycles to 30 and the learning rate to 0.005, I was getting an F1 score of 86%. However, upon changing these to 60 and 0.001, the score jumped by 5%.
In the log, It also mentioned that a TensorFlow Lite model is created.
4. Model TestingDuring data acquisition, it was mentioned that the training and test sets were divided into approximately 80% and 20%, respectively. In this step, the model is tested on roughly 24 samples of hands and fists that it had not seen previously. This provides an accurate picture of the model's performance based on real-world data..
It can be seen the out of 24 test samples, 4 have failed giving an overall accuracy of 83%. One way to improve the accuracy would be taking additional samples for training. The training cycles and learning rate can also be tweaked to perform better.
5. DeploymentEdge Impulse is quite flexible for model deployment. Options are available to export the model as a firmware, library, for use in a browser, and more.
Since the ESP32 firmware is already supported on the Arduino, I used the Arduino Library for this use case. The library must be downloaded and installed on the Arduino IDE.
As the ESP-CAM does not include a built-in programmer, an external one has to be used. I have made use of an FTDI programmer(USB to TTL) to flash the code. The IO0 pin must be connected to GND to enable programming mode on the ESP-CAM.
Deploying an ML model on an ESP-CAM was successful, focusing solely on image classification. Upon reviewing the code, I realized that image localization is also possible, as the model returns the coordinates of recognized images.
I even played around with the ESP's CPU performance by varying the frequency. With a lower CPU frequency i.e 80 MHz, the inference time was around 300 ms. Setting the frequency to 240 MHz resulted in a significant reduction inference time of around 130 ms.
The initial compilation gave me an error w.r.t. to the ESP32 firmware code.
c:\users\....\packages\esp32\tools\xtensa-esp32-elf-gcc\gcc8_4_0-esp-2021r2-patch3\xtensa-esp32-elf\include\c++\8.4.0\system_error:39:10: fatal error: bits/error_constants.h: No such file or directory
#include <bits/error_constants.h>
I noticed that the header file mentioned above was missing from the ESP32 arduino firmware directory. I checked about the issue on several forums until I found a solution on GitHub(link here). It appears that adding a header file and its contents(available on the link) solved the problem.
Comments