Published February 18, 2022 © CC BY-NC-SA

TinyML: Live Image Classification on ESP32-CAM and TFT

A modified example that can display the captured image and its classification result on a display. Say goodbye to clumsy WiFi connections!

BeginnerShowcase (no instructions)14,812

TinyML: Live Image Classification on ESP32-CAM and TFT

Things used in this project

Hardware components

Espressif AI Thinker ESP32-CAM

ST7735S TFT display

SparkFun Breadboard Power Supply Stick 5V/3.3V

Breadboard (generic)

SparkFun Pushbutton switch 12mm

Cable, USB to TTL Serial Converter 5V

Software apps and online services

Arduino IDE

Edge Impulse Studio

Story

A while ago, I decided to try out the example ESP32 Cam and Edge Impulse but found it broken. A lot of library dependencies were missing. It turns out that Espressif had overhauled their esp-face repo into esp-dl, and focused more on the more expensive ESP-EYE.

You can find instructions in the original repo or this article of how to train your own TinyML model on Edge Impulse. After download the deployed Arduino library, change the imported library name in my script.

I copied the necessary files and made it work. But it is still not an elegant solution; you have to view the image in a browser, on a computer connected to the same (stable) WiFi. And you need to refresh the page (send a new request) in order to get a new photo.

So I decided to improve it, to see if I can display the image directly on a TFT color display.

At first it was not successful, since most examples use TJpg_Decoder and it use a lot of memory, causing the ESP32-CAM crash then reboot. Then I found out that there's an function from the ESP32 library to convert JPEG into RGB565 (which is the format used by the Adafruit driver). I can even scale the image to 1/2 side size (= 1/4) so it fit the ST7735S 160x128 or 128x128 displays nicely. Everything works and problem solved.

(No, I do not have access to Adafruit products, but I imagine them would work the same.)

You can find some more details about wiring, the training data (Kaggle Cats and Dogs Dataset) and the model (MobileNetV1 96x96 0.25 with transfer learning) on my repo. There's also a copy of my model library and a boilerplate version (without using button and TFT).

The train accuracy is 89.8% and test accuracy is 86.97% on Edge Impulse. Captured image is 240x240 (resized to 120x120 on TFT and 96x96 for the model). Model inference (predict) time on ESP32-CAM is 2607 ms (2.6 secs). It's not fast, but the setup is so cheap I think this can actually be useful as real world applications...?