SynapEdge is a compiler that transforms ONNX models into C code, allowing deployment on any microcontroller without complex dependencies or hardware-specific requirements. This project showcases SynapEdge's capabilities by running the YOLOv5 object detection model on an ESP32-S3 microcontroller, demonstrating its ability to compile sophisticated AI models for resource-limited edge devices. Although this YOLOv5 implementation isn’t real-time, SynapEdge supports a wide variety of ONNX models as long as the operators are compatible (check updates) enabling real-time applications like MNIST classification, pattern detection in sensor data (e.g., accelerometers and gyros), and crop health monitoring. This versatility opens up numerous possibilities for edge AI solutions.
Prerequisites
- An ESP32-S3 module with at least 16MB of flash memory and 8MB of PSRAM. Memory depends on model size, Inference of small models such as MNIST doesn't consume large memory.
- An LCD module, compatible with the TFT_eSPI library or any other library. (Ensure your LCD is properly configured using the TFT_eSPI library or with the library you are using.)
This guide is tailored for the ESP32-S3 Dev (N16R8) Module. Ensure your microcontroller has sufficient flash and RAM for your project. Perform the following steps in your Arduino IDE:
Select the Board:
- Select ESP32S3 Dev Module from the board menu.
- Go to Tools > Board > ESP32S3 Dev Module.
Configure Settings:
- Go to Tools, then:
- Set Flash Size to 16MB.
- Enable PSRAM.
Edit boards.txt:
- Locate the boards.txt file for the ESP32 package. For example:
C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\boards.txt
- Replace <your_username> with your actual Windows username.
- Open boards.txt in a text editor.
- Find the section starting with esp32s3.menu.PartitionScheme.
- At the end of this section, add the following lines:
esp32s3.menu.PartitionScheme.My_16MB=16M Flash (15MB APP)
esp32s3.menu.PartitionScheme.My_16MB.build.partitions=My_16MB
esp32s3.menu.PartitionScheme.My_16MB.upload.maximum_size=15728640
Create Partition Table:
- Create a file named My_16MB.csv and add the following
# Name, Type, SubType, Offset, Size, Flags
nvs, data, nvs, 0x9000, 0x5000,
otadata, data, ota, 0xe000, 0x2000,
app, app, factory, 0x10000, 0xF00000,
ffat, data, fat, 0xF10000, 0xE0000,
coredump, data, coredump, 0xFF0000, 0x10000,
- Save My_16MB.csv in the ESP32 partition folder:
C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\tools\partitions
- Replace <your_username> with your actual Windows username.
Restart the IDE:
- Close and re-open the Arduino IDE to apply the changes.
- Goto tools and select My_16MB in the Partition Scheme (if not present restart PC).
Connect your LCD with ESP32. This example uses an 8-bit parallel interface. Any LCD can be used.
Compile ModelUse this notebook to compile the model.
Create an Arduino sketch
Download files.- Download the files
yolo5n.c
,yolo5n.h
, and the weight files such asyolo5n_weight_0.h
,yolo5n_weight_1.h
, etc from notebook. Then, copy all these files into your Arduino sketch folder. After doing this, you should see all the files added as tabs in the Arduino IDE.
- Rename the file
yolo5n.c
toyolo5n.cpp
to make it compatible with the Arduino IDE, which expects C++ files. - The ESP32 has a limited amount of internal SRAM. We will use the external PSRAM available on the ESP32 module to handle larger data structures. Therefore, we need to allocate the tensor variables in PSRAM explicitly. Open the
yolo5n.cpp
file and add the line#include "esp32-hal-psram.h"
at the top of the file. (Skip this step for small models such as MNIST) - In the
yolo5n.cpp
file, locate the forward pass functionforward_pass()
. At the beginning of this function. Initialize all tensor unions in PSRAM. For each union, useunion tensor_union_0 *tu0 = (union tensor_union_0 *)ps_malloc(sizeof(union tensor_union_0));
to allocate memory fortu0
and so on in PSRAM. (Skip this step for small models such as MNIST) - At the end of the forward pass function, release the memory allocated for all tensor unions by using
free(tu0);
for each tensor union, such astu0
,tu1
, etc. (Skip this step for small models such as MNIST) - Open the
yolo5n.h
header file and comment out all the static union initializations. For example, changestatic union tensor_union_0 tu0;
to//static union tensor_union_0 tu0;
to prevent static allocation in internal SRAM. (Skip this step for small models such as MNIST)
- Create an
image.h
file in your sketch folder and add a header guard (e.g.,#ifndef IMAGE_H #define IMAGE_H ... #endif
). - Define
#define I_HEIGHT 250
and#define I_WIDTH 250
for the image dimensions. - Convert your image into a C array using a tool like
https://notisrac.github.io/FileToCArray/
. Ensure the image format is RGB565 and resize it to250x250
. - Set the conversion settings to output
static const uint16_t images[] PROGMEM
. - Modify
images[]
toimages[I_HEIGHT][I_WIDTH]
to match the defined dimensions.
- Resize the Image to 224x224 for the forward pass
- We need to normalize the image to a range of [0, 1].
- YOLOv5 expects input in the format
[B],[3],[h],[w]
.
void normalizeImage(uint16_t input[DST_HEIGHT][DST_WIDTH], float output[1][3][DST_HEIGHT][DST_WIDTH]) {
for (int i = 0; i < DST_WIDTH; ++i) {
for (int j = 0; j < DST_HEIGHT; ++j) {
// Extract RGB components from 16-bit RGB565
uint16_t pixel = input[i][j];
uint8_t r = (pixel >> 11) & 0x1F; // Red (5 bits)
uint8_t g = (pixel >> 5) & 0x3F; // Green (6 bits)
uint8_t b = pixel & 0x1F; // Blue (5 bits)
// **Normalize RGB888 to [0, 1] range**
output[0][0][i][j] = (float)r / 31.0f;
output[0][1][i][j] = (float)g / 64.0f;
output[0][2][i][j] = (float)b / 31.0f;
}
}
}
uint16_t *resizedImage = (uint16_t *)ps_malloc((DST_WIDTH * DST_HEIGHT) * sizeof(uint16_t));
float(*normalized)[3][DST_WIDTH][DST_HEIGHT] = (float(*)[3][DST_WIDTH][DST_HEIGHT])ps_malloc(sizeof(float) * (DST_WIDTH * DST_HEIGHT) * 3);
float(*output)[3087][85] = (float(*)[3087][85])ps_malloc(sizeof(float) * (3087 * 85));
resizeImage(*picture_1, resizedImage); // Resize Image for forward Pass
uint16_t(*resizedImage_2d)[DST_WIDTH][DST_HEIGHT] = (uint16_t(*)[DST_WIDTH][DST_HEIGHT])resizedImage;
normalizeImage(*resizedImage_2d, normalized);
Forward Passforward_pass(normalized, output); // Perform inference
Post Processing- Parse Yolo5 output
typedef struct {
float x, y, w, h;
float confidence;
float class_scores;
int class_id;
} Detection;
// Helper function to compute Intersection over Union (IoU) between two detections
float compute_iou(Detection a, Detection b) {
int x_left = fmax(a.x, b.x);
int y_top = fmax(a.y, b.y);
int x_right = fmin(a.x + a.w, b.x + b.w);
int y_bottom = fmin(a.y + a.h, b.y + b.h);
if (x_right < x_left || y_bottom < y_top)
return 0.0f;
int intersection_area = (x_right - x_left) * (y_bottom - y_top);
int area_a = a.w * a.h;
int area_b = b.w * b.h;
int union_area = area_a + area_b - intersection_area;
return (float)intersection_area / union_area;
}
// Non-Maximum Suppression to filter out overlapping detections
void non_maximum_suppression(Detection detections[], int *det_count, float iou_threshold) {
// Simple O(n^2) NMS based on the combined score
for (int i = 0; i < *det_count; i++) {
// Skip suppressed detections (confidence == 0)
if (detections[i].confidence <= 0)
continue;
for (int j = i + 1; j < *det_count; j++) {
if (detections[j].confidence <= 0)
continue;
// If the boxes overlap more than the threshold, suppress the lower score box.
if (compute_iou(detections[i], detections[j]) > iou_threshold) {
// Here, we simply suppress detection j.
// You could also compare scores and choose which to keep.
detections[j].confidence = 0;
}
}
}
// Compact the detections array to remove suppressed detections
int new_count = 0;
for (int i = 0; i < *det_count; i++) {
if (detections[i].confidence > 0) {
detections[new_count++] = detections[i];
}
}
*det_count = new_count;
}
void parse_yolo_output(float output[NUM_BOXES][85], Detection detections[], int *det_count) {
*det_count = 0;
float scale_x = (float)original_width / (float)DST_WIDTH;
float scale_y = (float)original_height / (float)DST_HEIGHT;
for (int i = 0; i < NUM_BOXES; i++) {
float confidence = output[i][4]; // Confidence
if (confidence < CONFIDENCE_THRESHOLD) continue;
detections[*det_count].x = output[i][0];
detections[*det_count].y = output[i][1];
detections[*det_count].w = output[i][2];
detections[*det_count].h = output[i][3];
detections[*det_count].confidence = confidence;
// Get class with highest probability
// Find class with maximum score
float max_class_score = -INFINITY;
int class_id = -1;
for (int j = 5; j < 85; j++) {
if (output[i][j] > max_class_score) {
max_class_score = output[i][j];
class_id = j - 5;
}
}
float combined_score = max_class_score * confidence;
if (combined_score < 0.4f) continue;
// Extract bounding box parameters
float cx = output[i][0];
float cy = output[i][1];
float w = output[i][2];
float h = output[i][3];
// Convert to image coordinates
int x_min = (int)((cx - w / 2.0f) * scale_x);
int y_min = (int)((cy - h / 2.0f) * scale_y);
int x_max = (int)((cx + w / 2.0f) * scale_x);
int y_max = (int)((cy + h / 2.0f) * scale_y);
// Clamp coordinates to image dimensions
x_min = fmax(0, fmin(x_min, original_width - 1));
y_min = fmax(0, fmin(y_min, original_height - 1));
x_max = fmax(0, fmin(x_max, original_width - 1));
y_max = fmax(0, fmin(y_max, original_height - 1));
//detections[*det_count].class_id = best_class;
detections[*det_count].x = x_min;
detections[*det_count].y = y_min;
detections[*det_count].w = x_max - x_min; // Width
detections[*det_count].h = y_max - y_min; // Height
detections[*det_count].class_scores = combined_score;
detections[*det_count].class_id = class_id;
(*det_count)++;
}
non_maximum_suppression(detections, det_count, IOU_THRESHOLD);
}
Find the code here.
Try to implement MNIST with a touch screen.
Comments
Please log in or sign up to comment.