2024 winter vacation training-based on Seeed XIAO ESP32S3 Sense development board to achieve pet expression recognition system
Chapter 1 Introduction
1.1 Introduction to project functions
This project realizes the function of pet expression recognition based on target detection, training more than 1, 000 images, and fusing larger pre-trained models with these images to improve reasoning effect.
The ESP32S3 development board realizes image recognition and reasoning by using the camera, monitors pet expressions in real time, uploads the expressions to the host computer if there are expressions such as "sad", "happy" and "angry", and provides information to users in time.
This project relies on the winter vacation training platform 7 in 2024: https://www.example.com. www.eetree.cn/task/412
1.2 Design ideas
With the increasing development of technology, smart home is no longer an unreachable dream. We aspire to create a home environment that not only makes our families more comfortable, safe and convenient, but also allows our pet partners to enjoy the same care and comfort from technology.
Seeed XIAO ESP32 S3 Sense development board has a camera, and ESP32 S3 chip is compatible with embedded TinyML AI tools such as Edge Impulse and SenseCraft AI, which initially has the ability to deploy image recognition models.
Silicon Technology provides a training framework for ModelAssistant, and the trained model can be adapted to SenseCraft AI.
The online collaborative data annotation site roboflow can quickly annotate and generate datasets suitable for ModelAssistant.
To sum up, eForest and Silicon Technology provide several tools that can perform data labeling, data preprocessing, model training, model fine-tuning, model deployment around a set of animal expression data sets, and finally realize the function of pet expression recognition based on target detection.
1.3 Hardware Block Diagram
XIAO ESP32S3/SIAO ESP32S3 inductive front end indicator diagram
XIAO ESP32S3/SIAO ESP32S3 induction return indicator diagram
XIAO ESP32S3/SIAO ESP32S3 Sense Pin List
1.4 Software Flow Chart
1.5 Acquisition of project-related files
Pet Emotion Data Set-Label. This data set is classified into labels. With the automatic labeling below, more than one thousand pictures can be quickly exported as training sets.
Web link:
Link: pan.quark.cn/s/91d5bb453dd8
Extraction code: HH3G
Upload content from website, ArduinoESP32S3_image.zip:
It contains the camera code corresponding to "camera code debugging" and the code "esp32_camera_test_pe.ino" used to deploy the AI model.
There are also official tutorials for the pre-training framework, "Classification Google-Colab-PFLD-Grove-Example.ipynb" and "Target Detection Google-Colab-SWFIT-YOLO-A1101-Example.ipynb".
It also provides "1epoch_5_int8.tflite", which can be deployed directly to SenseCraft AI's "anger level detection model", and "4epoch_5_int8.tflite" pet expression detection model.
Chapter 2: Introduction to Hardware
2.1 Characteristics
Seeed Studio XIAO series is a small development board that shares a similar hardware architecture, the size is actually the size of a thumb. Here "XIAO" stands for its size feature "small, " another layer means that it is powerful "Xiao." Seeed Studio XIAO ESP32S3 Sense integrates camera sensor, digital microphone and SD card support. Combined with embedded ML computing power and photographic capabilities, this development board can be the perfect tool to get started with intelligent voice and visual AI.
Processor ESP32-S3R8
Xtensa LX7 Dual Core 32-bit Processor, Operating Up to 240 MHz Interface 1x UART, 1x IIC, 1x IIS, 1x SPI, 11x GPIOs (PWM), 9x ADC, 1x User LED, 1x Charge LED, 1x B2B Connector (with 2 additional GPIOs)
1x Reset button, 1x Boot button
Wireless Complete 2.4GHz Wi-Fi Subsystem
BLE: Bluetooth 5.0, Bluetooth mesh size 21 x 17.5 x 15mm (with expansion board)
Built-in sensor OV2640 camera sensor for 1600*1200, digital microphone Input voltage 5V (Type-C), 4.2V (BAT)
8 MB PSRAM and 8MB flash memory on memory chip
Onboard SD card slot, supports 32GB FAT Operating temperature-40 ° C ~ 65°C
Powerful MCU board: ESP32 S3 32-bit dual-core Xtensa processor chip, running at up to 240 MHz, multiple development ports installed, support for Arduino/MicroPython
Advanced features: detachable OV2640 camera sensor with resolution of 1600*1200, compatible with OV5640 camera sensor, built-in additional digital microphone
Well-designed power supply: Lithium battery charge management with 4 power modes for deep sleep down to 14μA
Good memories with more possibilities: 8MB PSRAM and 8MB FLASH, SD card slot for external 32GB FAT memory
Excellent RF performance: Support 2.4GHz Wi-Fi and BLE dual wireless communication, support 100m + remote communication when connected to U.FL antenna
Compact design: 21x17.5mm in the classic shape of XIAO for space-constrained projects such as wearables
Pre-trained Al model from SenseCraft Al for deployment without code
2.2 Reference links
·[Wiki] Get started with Seeed Studio XIAO ESP32S3 (Sense)
·Hardware Basics (Sense version)
·
o Camera usage
o Microphone usage
o SD card slot usage and file system
·A collection of tutorials and projects based on Seeed Studio XIAO
·XIAO SoM User Manual
·Using TinyML tutorial on XIAO ESP32S3(Sense)
XIAO ESP32S3 Sense becomes a community board supported by Edge Impulse
PlatformIO adds support for XIAO ESP32S3
· XIAO ESP32S3 with MicroPython
· SenseCraft Al
Chapter 3 Functions and Picture Display
3.1 Anger recognition
This section only implements target detection related to anger and evaluates anger based on pet expressions:
3.2 Pet Expression Recognition
This part can recognize four expressions.
Chapter 4 Main Code Fragments and Descriptions
4.1 Robotflow vs. SenseCraft AI
4.1.1 Implementation of Online Deployment Scheme--AI-assisted Labeling Tutorial
Create a project, select tracking, classify project Baidu Feijiang Notebook can not be trained.
Create several labels.
Data set upload, the provided data set has been labeled, very easy to use the system automatic labeling.
First click "save and continue", upload to use automatic labeling
It is recommended to select four random to see the marked results.
Click edit all.
Delete until there is only one tag left.
Select four pictures to test the effect. Since each import is a class label, reduce the confidence level. All that's needed is tracking and tagging specific classes. Then click Auto label.
Note, using AI assistance. Just close your browser.
After each round of labeling is completed, an email reminder will be sent. Currently (March 2024) it is in Free Beta and is free to use.
When you receive an email notification after marking, click review.
Accept, then join,
This will update the dataset.
For the principle of automatic labeling above, you can view this link: blog.roboflow.com/yolo-world-prompting-tips/
Based on the above scheme, the dataset can be expanded infinitely and rapidly.
Import completed, click rebalance, re-proportion, meet the training framework requirements.
Changed to 7:3 with no test set.
Other options are direct continue, subsequent create. Then export:
Select the Coco format:
Get code: app.roboflow.com/ds/idWpYZYX6e? key=6dUnJDONfV
4.1.2 Training
Enter Feijiang Community, Feijiang AI Studio Xinghe Community-Artificial Intelligence Learning and Training Community (baidu.com).
Refer to this documentation tutorial:
https://doc.weixin.qq.com/doc/w3_AeEAVwagAOc4L8kgTHcRAeB09lxvL? scode=AGEAZwfLABEAovuSS1AeEAVwagAOc
Search for "ModelAssistant" in public projects. Since this project uses more than 1, 000 images, it uses a server of 2 computing points per hour.
Step-by-step operation requires modifying this location for downloading code.
Modify the number of classes simultaneously, %env NUM_CLASSES=4
Note that every time you restart the environment, you have to install it once. Some libraries may change the configuration of flying slurry and the storage failure will have to be redeployed.
After each cell is run in turn, download it from the following location.
/home/aistudio/ModelAssistant/work_dirs/swift_yolo_tiny_1xb16_300e_coco/epoch_5_int8.tflite
Code snippet parsing
The following is the training part of the model.
num_classes is how many classes there are and how many need to be divided into.
!python tools/train.py \
configs/swift_yolo/swift_yolo_tiny_1xb16_300e_coco.py \
--cfg-options \
epochs=10 \
num_classes=${NUM_CLASSES} \
workers=1\
imgsz=192,192 \
data_root=${DATA_ROOT} \
load_from=https://files.seeedstudio.com/sscma/model_zoo/detection/person/person_detection.pth
Verify the model after training.
! python tools/inference.py \
configs/swift_yolo/swift_yolo_tiny_1xb16_300e_coco.py \
"$(cat work_dirs/swift_yolo_tiny_1xb16_300e_coco/last_checkpoint)" \
--dump work_dirs/swift_yolo_tiny_1xb16_300e_coco/last_checkpoint.pkl \
--cfg-options \
data_root=${DATA_ROOT} \
num_classes=${NUM_CLASSES} \
workers=1 \
imgsz=192,192
Next export the training results into deployable files. Export location is/home/aistudio/ModelAssistant/work_dirs/swift_yolo_tiny_1xb16_300e_coco/epoch_5_int8.tflite.
! python tools/export.py \
configs/swift_yolo/swift_yolo_tiny_1xb16_300e_coco.py \
$(cat work_dirs/swift_yolo_tiny_1xb16_300e_coco/last_checkpoint) \
--cfg-options \
data_root=${DATA_ROOT} \
num_classes=${NUM_CLASSES} \
imgsz=192,192
4.1.3 Deployment
Using this tool, SenseCraft AI (seeed-studio.github.io)
https://seeed-studio.github.io/SenseCraft-Web-Toolkit/#/setup/process
Select the exported epoch_5_int8.tflite to upload and deploy, and then reason.
4.2 Edge impulse and Arduino IDE solutions
First import the libraries needed for the AI model:
#include <Pet_Expression_inferencing.h>
#include "edge-impulse-sdk/dsp/image/image.hpp"
Run Model Reasoning:
// Run the classifier
ei_impulse_result_t result = { 0 };
EI_IMPULSE_ERROR err = run_classifier(&signal, &result, debug_nn);
if (err != EI_IMPULSE_OK) {
ei_printf("ERR: Failed to run classifier (%d)\n", err);
return;
}
// print the predictions
ei_printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n",
result.timing.dsp, result.timing.classification, result.timing.anomaly);
Chapter 5: Main Problems and Solutions
5.1 Robotflow and SenseCraft AI-AI-assisted data annotation
In the data set preprocessing stage, the roboflow preprocessing is too cumbersome, requiring each picture to be labeled. After the year, the platform launched the AI-assisted labeling function, realizing more than one thousand picture labeling.
5.2 Edge impulse and Arduino IDE solutions
5.2.1 Preamble to this part and reasons for abandonment
In addition, this section records debugging edge impulse scheme. This part of the compilation process may have errors, and there may be situations that cannot be run after compilation, such as the camera cannot be connected. Even after passing, the specific recognition effect is not good enough, because it is equivalent to adding a few image fine-tuning to the pre-trained model, which is difficult to further optimize by means of parameter adjustment.
The feasible way to solve the above difficulties is to update and adapt the edge impulse platform, including ESPNN. For the user, it is possible to increase the training set of the scheme. Of course, since edge impulse uses a small number of training sets to fine-tune the larger model scheme, some unpopular image recognition requirements are not limited to classification and target detection, and cannot be well recognized based on the general image model. At this time, other TinyML schemes must be sought. This project as a "pet expression detection" project, in the past competition and open source projects relatively unpopular, based on the following scheme recognition effect is not very good, so this article finally shows the effect of the scheme, or use 4.1 and 5.1 mentioned Robotflow and SenseCraft AI scheme.
This project refers to the following tutorials to form a more comprehensive model deployment technology route:
https://www.hackster.io/mjrobot/tinyml-made-easy-image-classification-cb42ae
This section refers to C code for the following projects:
https://github.com/Mjrovai/XIAO-ESP32S3-Sense
edge impulse website tutorial:
https://studio.edgeimpulse.com/studio/profile/projects? autoredirect=1
5.2.2 Hardware preparation
Memory cards should be formatted.
5.2.3 Data set preprocessing
This paper is based on the following image datasets:
https://drive.weixin.qq.com/s? k=AGEAZwfLABEKM9bBhVATkA4QaAAGk#/preview? fileId=i.1970325135517537.1688855925285177_f.6983006661lpd
At the same time, due to the limited accuracy of the network dataset, it may be better to use the following photo program and call seed xiao to take photos to form a dataset.
https://drive.weixin.qq.com/s? k=AGEAZwfLABEauo00pUATkA4QaAAGk
5.2.4 Online training
In the early stage of classification exploration, several online training schemes were determined, but they were not satisfactory.
Finally use the site.
create a project
Click add existing data to upload.
after uploading
training setup
Set Image Grayscale
generate results
Using MobileNet V1 and α=0.10 (~ 53.2K RAM and 101K ROM), a smaller footprint is obtained in the other extreme case.
When we first released this project running on ESP32-CAM, we adopted a low probability, which guaranteed low inference latency but low accuracy. For the first test, we will retain this model design (MobileNet V1 and α=0.10).
The final training settings are as follows
training results
ESP32 is known to have 8 MB PSRAM and 8MB flash memory on board, so better models can be replaced
MobileNetV2 96x96 0.35 was selected for secondary training.
The results of this training are acceptable
Next, export the model
5.2.5 Burning completed
The project uses 467553 bytes (13%) of program storage space. The maximum value is 3342336 bytes.
Global variables use 32300 bytes (9%) of dynamic memory, leaving 295380 bytes for local variables. The maximum value is 327680 bytes.
This article believes that ESP development board high-quality Internet of Things features, as well as the official full support, can link its smart home, to achieve more functions.
With the mobile phone as the central control, with the help of the calculation power of the mobile phone and the single-chip microcomputer as the end-side sensor, more items with higher configuration requirements such as AI or graphics calculation can be realized.
Based on the large model API, an intelligent dial based on the chip can be realized, and several AI applications such as general artificial intelligence text question and answer, Wensheng map and Wensheng video can be realized on the dial.
Finally, ESP32S3 chip has a certain AI reasoning ability, to meet some edge computing application scenarios, is a good development board.
Comments