Setup ESP32
Connect LCD
Compile Model
Download files
Changes
Input Image
Pre Processing
Forward Pass
Post Processing

Published March 24, 2025

Running YOLOv5n on an ESP32-S3

Solution for deploying advanced AI models on embedded systems, Using our Synapedge ONNX-to-C compiler.

IntermediateProtip633

Things used in this project

Hardware components

Espressif ESP32-S3-DevKitC-1

16MB Flash,8MB PSRAM

TFT LCD sheild

An LCD module compatible with esp32 or with your hardware

Software apps and online services

Arduino IDE

Synapedge

Story

SynapEdge is a compiler that transforms ONNX models into C code, allowing deployment on any microcontroller without complex dependencies or hardware-specific requirements. This project showcases SynapEdge's capabilities by running the YOLOv5 object detection model on an ESP32-S3 microcontroller, demonstrating its ability to compile sophisticated AI models for resource-limited edge devices. Although this YOLOv5 implementation isn’t real-time, SynapEdge supports a wide variety of ONNX models as long as the operators are compatible (check updates) enabling real-time applications like MNIST classification, pattern detection in sensor data (e.g., accelerometers and gyros), and crop health monitoring. This versatility opens up numerous possibilities for edge AI solutions.

Prerequisites

An ESP32-S3 module with at least 16MB of flash memory and 8MB of PSRAM. Memory depends on model size, Inference of small models such as MNIST doesn't consume large memory.
An LCD module, compatible with the TFT_eSPI library or any other library. (Ensure your LCD is properly configured using the TFT_eSPI library or with the library you are using.)

Setup ESP32

This guide is tailored for the ESP32-S3 Dev (N16R8) Module. Ensure your microcontroller has sufficient flash and RAM for your project. Perform the following steps in your Arduino IDE:

Select the Board:

Select ESP32S3 Dev Module from the board menu.
Go to Tools > Board > ESP32S3 Dev Module.

Configure Settings:

Go to Tools, then:
Set Flash Size to 16MB.
Enable PSRAM.

Edit boards.txt:

Locate the boards.txt file for the ESP32 package. For example:

C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\boards.txt

Replace <your_username> with your actual Windows username.
Open boards.txt in a text editor.
Find the section starting with esp32s3.menu.PartitionScheme.
At the end of this section, add the following lines:

esp32s3.menu.PartitionScheme.My_16MB=16M Flash (15MB APP)
esp32s3.menu.PartitionScheme.My_16MB.build.partitions=My_16MB
esp32s3.menu.PartitionScheme.My_16MB.upload.maximum_size=15728640

Create Partition Table:

Create a file named My_16MB.csv and add the following

# Name,      Type,   SubType,  Offset,   Size,       Flags
nvs,        data,   nvs,      0x9000,   0x5000,
otadata,    data,   ota,      0xe000,   0x2000,
app,        app,    factory,  0x10000,  0xF00000,
ffat,       data,   fat,      0xF10000, 0xE0000,
coredump,   data,   coredump, 0xFF0000, 0x10000,

Save My_16MB.csv in the ESP32 partition folder:

C:\Users\<your_username>\AppData\Local\Arduino15\packages\esp32\hardware\esp32\2.0.11\tools\partitions

Replace <your_username> with your actual Windows username.

Restart the IDE:

Close and re-open the Arduino IDE to apply the changes.
Goto tools and select My_16MB in the Partition Scheme (if not present restart PC).

Connect LCD

Connect your LCD with ESP32. This example uses an 8-bit parallel interface. Any LCD can be used.

Compile Model

Use this notebook to compile the model.

Create an Arduino sketch

Download files.

Download the files yolo5n.c, yolo5n.h, and the weight files such as yolo5n_weight_0.h, yolo5n_weight_1.h, etc from notebook. Then, copy all these files into your Arduino sketch folder. After doing this, you should see all the files added as tabs in the Arduino IDE.

keep files in sketch folder.

Changes

Rename the file yolo5n.c to yolo5n.cpp to make it compatible with the Arduino IDE, which expects C++ files.
The ESP32 has a limited amount of internal SRAM. We will use the external PSRAM available on the ESP32 module to handle larger data structures. Therefore, we need to allocate the tensor variables in PSRAM explicitly. Open the yolo5n.cpp file and add the line #include "esp32-hal-psram.h" at the top of the file. (Skip this step for small models such as MNIST)
In the yolo5n.cpp file, locate the forward pass function forward_pass(). At the beginning of this function. Initialize all tensor unions in PSRAM. For each union, use union tensor_union_0 *tu0 = (union tensor_union_0 *)ps_malloc(sizeof(union tensor_union_0)); to allocate memory for tu0 and so on in PSRAM. (Skip this step for small models such as MNIST)
At the end of the forward pass function, release the memory allocated for all tensor unions by using free(tu0); for each tensor union, such as tu0, tu1, etc. (Skip this step for small models such as MNIST)
Open the yolo5n.h header file and comment out all the static union initializations. For example, change static union tensor_union_0 tu0; to //static union tensor_union_0 tu0; to prevent static allocation in internal SRAM. (Skip this step for small models such as MNIST)

Input Image

Create an image.h file in your sketch folder and add a header guard (e.g., #ifndef IMAGE_H #define IMAGE_H ... #endif).
Define #define I_HEIGHT 250 and #define I_WIDTH 250 for the image dimensions.
Convert your image into a C array using a tool like https://notisrac.github.io/FileToCArray/. Ensure the image format is RGB565 and resize it to 250x250.
Set the conversion settings to output static const uint16_t images[] PROGMEM.
Modify images[] to images[I_HEIGHT][I_WIDTH] to match the defined dimensions.

Pre Processing

Resize the Image to 224x224 for the forward pass
We need to normalize the image to a range of [0, 1].
YOLOv5 expects input in the format [B],[3],[h],[w].

void normalizeImage(uint16_t input[DST_HEIGHT][DST_WIDTH], float output[1][3][DST_HEIGHT][DST_WIDTH]) {
for (int i = 0; i < DST_WIDTH; ++i) {
for (int j = 0; j < DST_HEIGHT; ++j) {
// Extract RGB components from 16-bit RGB565
uint16_t pixel = input[i][j];
uint8_t r = (pixel >> 11) & 0x1F;  // Red (5 bits)
uint8_t g = (pixel >> 5) & 0x3F;   // Green (6 bits)
uint8_t b = pixel & 0x1F;          // Blue (5 bits)

// **Normalize RGB888 to [0, 1] range**
output[0][0][i][j] = (float)r / 31.0f;
output[0][1][i][j] = (float)g / 64.0f;
output[0][2][i][j] = (float)b / 31.0f;
}
}
}

uint16_t *resizedImage = (uint16_t *)ps_malloc((DST_WIDTH * DST_HEIGHT) * sizeof(uint16_t));
float(*normalized)[3][DST_WIDTH][DST_HEIGHT] = (float(*)[3][DST_WIDTH][DST_HEIGHT])ps_malloc(sizeof(float) * (DST_WIDTH * DST_HEIGHT) * 3);
float(*output)[3087][85] = (float(*)[3087][85])ps_malloc(sizeof(float) * (3087 * 85));
resizeImage(*picture_1, resizedImage); // Resize Image for forward Pass
uint16_t(*resizedImage_2d)[DST_WIDTH][DST_HEIGHT] = (uint16_t(*)[DST_WIDTH][DST_HEIGHT])resizedImage;
normalizeImage(*resizedImage_2d, normalized);

Forward Pass

forward_pass(normalized, output); // Perform inference

Post Processing

Parse Yolo5 output

typedef struct {
float x, y, w, h;
float confidence;
float class_scores;
int class_id;
} Detection;
// Helper function to compute Intersection over Union (IoU) between two detections
float compute_iou(Detection a, Detection b) {
int x_left   = fmax(a.x, b.x);
int y_top    = fmax(a.y, b.y);
int x_right  = fmin(a.x + a.w, b.x + b.w);
int y_bottom = fmin(a.y + a.h, b.y + b.h);
if (x_right < x_left || y_bottom < y_top)
return 0.0f;
int intersection_area = (x_right - x_left) * (y_bottom - y_top);
int area_a = a.w * a.h;
int area_b = b.w * b.h;
int union_area = area_a + area_b - intersection_area;
return (float)intersection_area / union_area;
}
// Non-Maximum Suppression to filter out overlapping detections
void non_maximum_suppression(Detection detections[], int *det_count, float iou_threshold) {
// Simple O(n^2) NMS based on the combined score
for (int i = 0; i < *det_count; i++) {
// Skip suppressed detections (confidence == 0)
if (detections[i].confidence <= 0)
continue;
for (int j = i + 1; j < *det_count; j++) {
if (detections[j].confidence <= 0)
continue;
// If the boxes overlap more than the threshold, suppress the lower score box.
if (compute_iou(detections[i], detections[j]) > iou_threshold) {
// Here, we simply suppress detection j.
// You could also compare scores and choose which to keep.
detections[j].confidence = 0;
}
}
}
// Compact the detections array to remove suppressed detections
int new_count = 0;
for (int i = 0; i < *det_count; i++) {
if (detections[i].confidence > 0) {
detections[new_count++] = detections[i];
}
}
*det_count = new_count;
}
void parse_yolo_output(float output[NUM_BOXES][85], Detection detections[], int *det_count) {
*det_count = 0;
float scale_x = (float)original_width / (float)DST_WIDTH;
float scale_y = (float)original_height / (float)DST_HEIGHT;
for (int i = 0; i < NUM_BOXES; i++) {
float confidence = output[i][4];  // Confidence
if (confidence < CONFIDENCE_THRESHOLD) continue;
detections[*det_count].x = output[i][0];
detections[*det_count].y = output[i][1];
detections[*det_count].w = output[i][2];
detections[*det_count].h = output[i][3];
detections[*det_count].confidence = confidence;
// Get class with highest probability
// Find class with maximum score
float max_class_score = -INFINITY;
int class_id = -1;
for (int j = 5; j < 85; j++) {
if (output[i][j] > max_class_score) {
max_class_score = output[i][j];
class_id = j - 5;
}
}
float combined_score = max_class_score * confidence;
if (combined_score < 0.4f) continue;
// Extract bounding box parameters
float cx = output[i][0];
float cy = output[i][1];
float w = output[i][2];
float h = output[i][3];
// Convert to image coordinates
int x_min = (int)((cx - w / 2.0f) * scale_x);
int y_min = (int)((cy - h / 2.0f) * scale_y);
int x_max = (int)((cx + w / 2.0f) * scale_x);
int y_max = (int)((cy + h / 2.0f) * scale_y);
// Clamp coordinates to image dimensions
x_min = fmax(0, fmin(x_min, original_width - 1));
y_min = fmax(0, fmin(y_min, original_height - 1));
x_max = fmax(0, fmin(x_max, original_width - 1));
y_max = fmax(0, fmin(y_max, original_height - 1));
//detections[*det_count].class_id = best_class;
detections[*det_count].x = x_min;
detections[*det_count].y = y_min;
detections[*det_count].w = x_max - x_min;  // Width
detections[*det_count].h = y_max - y_min;  // Height
detections[*det_count].class_scores = combined_score;
detections[*det_count].class_id = class_id;
(*det_count)++;
}
non_maximum_suppression(detections, det_count, IOU_THRESHOLD);
}

Serial Monitor

Find the code here.

Try to implement MNIST with a touch screen.