This is my second project with Google Summer of Code (GSoC) under TensorFlow. There was no proper documentation on the internet to build a custom image recognition TinyML model, so my GSoC mentor, Paul Ruiz, suggested that I should try and solve it. Here's how you could also build an image recognition TinyML application. Happy Tinkering!
Click here to view my first GSoC project!The idea behind the project:
I wanted to work on a problem with fewer variables as the documentation for how to work with the camera module and process its data wasn't great. I choose to build an MNIST TinyML model as, in this case, I wouldn't need to worry about the training data set, and it would allow me to focus on the essential parts of the project to get things up and running. But, now that I have figured out all the parts to build a custom image recognition project, I have documented how to collect training data sets using the camera module.
The theme/tone for the blog?I want to warn you that this blog might get a bit complex to understand. There's a proper explanation for this: With an accelerometer-based application, it would be easy to do sanity checks by just printing out the accelerometer values of one axis on the serial monitor or plotter. In contrast, doing sanity checks for the image recognition application is at least 10x more tiresome because checking if a piece of code is doing the desired action cannot be visualized in real-time.
Some CommentsThis blog might be a bit hard to understand due to the complexity of unit testing. I want to address any gaps in explanation with feedback from the readers. So, comment below with your doubts and questions regarding anything related to image recognition on embedded systems.
Does TinyML make sense at all?I would recommend you to read through this fantastic article by Pete Warden, The author of the TinyML book, to understand why running machine learning models on microcontrollers makes sense and is the future of machine learning.
Even if TinyML makes sense, does image recognition make sense on TinyML?The full VGA (640×480 resolution) output from the OV7670 camera we'll be using here is too big for current TinyML applications. uTensor runs handwriting detection with MNIST that uses 28×28 images. The person detection example in the TensorFlow Lite for Microcontrollers example uses 96×96 which is more than enough. Even state-of-the-art 'Big ML' applications often only use 320×320 images. In conclusion, running image recognition applications on tiny microcontrollers makes a lot of sense
- Integration time!
- Problems with the project/ How to improve the project
- Some helpful pointers to building your own image recognition project
- Collecting training data using the OV7670 camera module
- Conclusion
11.a TinyML Model: Cropped input data
Github Link for this subsection.
Code explanation:
Camera.readFrame(pixels);
This line of code reads one frame from the camera and stores it in the pixels array.
for(int i =0; i<28;i++){
for(int j =0;j<28;j++){
pixel = pixels[176*i +j];
tft.drawPixel(i,j,pixel);
}
}
delay(1000);
These lines of code loop through the pixels array crop a 28x28 image from it and display it on the screen.
for(int i =0; i<28;i++){
for(int j =0;j<28;j++){
pixel = pixels[176*i +j];
red = ((pixel >> 11) & 0x1f) << 3;
green = ((pixel >> 5) & 0x3f) << 2;
blue = ((pixel >> 0) & 0x1f) << 3;
grayscale = (red + blue + green)/3 ;
if(grayscale <128){
grayscale =0;
}
tflInterpreter->input(0)->data.f[28*i+j] = grayscale / 255;
Serial.println(grayscale);
}
}
These lines of code loop through the pixels array crop a 28x28 image from it and sends it as input to the TinyML model.
The sketch:
//MPU6050_model.ino
#include <TensorFlowLite.h>
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"
#include "model.h"
#include <Adafruit_GFX.h> // Core graphics library
#include <Adafruit_ST7735.h> // Hardware-specific library for ST7735
#include <SPI.h>
#include <Arduino_OV767X.h>
const tflite::Model* tflModel = nullptr;
tflite::ErrorReporter* tflErrorReporter = nullptr;
TfLiteTensor* tflInputTensor = nullptr;
TfLiteTensor* tflOutputTensor = nullptr;
tflite::MicroInterpreter* tflInterpreter = nullptr;
#define TFT_CS A7
#define TFT_RST 7 // Or set to -1 and connect to Arduino RESET pin
#define TFT_DC A6
constexpr int tensorArenaSize = 140 * 1024;
uint8_t tensorArena[tensorArenaSize];
float out[10];
uint16_t pixels[176*144];
uint16_t color, pixel;
uint8_t red, blue, green;
float grayscale;
Adafruit_ST7735 tft = Adafruit_ST7735(TFT_CS, TFT_DC, TFT_RST);
void setup() {
Serial.begin(115200);
while (!Serial)
delay(10);
tft.initR(INITR_BLACKTAB);
delay(100);
if (!Camera.begin(QCIF, RGB565, 1)) {
Serial.println("Failed to initialize camera!");
while (1);
}
Serial.println(F("Initialized"));
static tflite::MicroErrorReporter micro_error_reporter;
tflErrorReporter = µ_error_reporter;
tflModel = tflite::GetModel(model);
if (tflModel->version() != TFLITE_SCHEMA_VERSION) {
TF_LITE_REPORT_ERROR(tflErrorReporter,
"Model provided is schema version %d not equal "
"to supported version %d.",
tflModel->version(), TFLITE_SCHEMA_VERSION);
return;
}
static tflite::MicroMutableOpResolver<6> micro_op_resolver;
micro_op_resolver.AddMaxPool2D();
micro_op_resolver.AddConv2D();
micro_op_resolver.AddDepthwiseConv2D();
micro_op_resolver.AddFullyConnected();
micro_op_resolver.AddReshape();
micro_op_resolver.AddSoftmax();
static tflite::MicroInterpreter static_interpreter(tflModel, micro_op_resolver, tensorArena, tensorArenaSize, tflErrorReporter);
tflInterpreter = &static_interpreter;
TfLiteStatus allocate_status = tflInterpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
TF_LITE_REPORT_ERROR(tflErrorReporter, "AllocateTensors() failed");
return;
}
tflInputTensor = tflInterpreter->input(0);
tft.fillScreen(ST77XX_BLACK);
delay(100);
tft.fillScreen(ST77XX_BLACK);
}
void loop() {
Camera.readFrame(pixels);
for(int i =0; i<28;i++){
for(int j =0;j<28;j++){
pixel = pixels[176*i +j];
tft.drawPixel(i,j,pixel);
}
}
delay(1000);
for(int i =0; i<28;i++){
for(int j =0;j<28;j++){
pixel = pixels[176*i +j];
red = ((pixel >> 11) & 0x1f) << 3;
green = ((pixel >> 5) & 0x3f) << 2;
blue = ((pixel >> 0) & 0x1f) << 3;
grayscale = (red + blue + green)/3 ;
if(grayscale <128){
grayscale =0;
}
tflInterpreter->input(0)->data.f[28*i+j] = grayscale / 255;
Serial.println(grayscale);
}
}
delay(1000);
TfLiteStatus invokeStatus = tflInterpreter->Invoke();
out[0] = tflInterpreter->output(0)->data.f[0];
out[1] = tflInterpreter->output(0)->data.f[1];
out[2] = tflInterpreter->output(0)->data.f[2];
out[3] = tflInterpreter->output(0)->data.f[3];
out[4] = tflInterpreter->output(0)->data.f[4];
out[5] = tflInterpreter->output(0)->data.f[5];
out[6] = tflInterpreter->output(0)->data.f[6];
out[7] = tflInterpreter->output(0)->data.f[7];
out[8] = tflInterpreter->output(0)->data.f[8];
out[9] = tflInterpreter->output(0)->data.f[9];
float maxVal = out[0];
int maxIndex = 0;
for(int k =0; k < 10;k++){
if (out[k] > maxVal) {
maxVal = out[k];
maxIndex = k;
}
}
Serial.print("Number ");
Serial.print(maxIndex);
Serial.println(" detected");
Serial.print("Confidence: ");
Serial.println(maxVal);
}
11.b TinyML Model: Reshaped input data
Github Link for this subsection.
Code explanation:
Camera.readFrame(pixels);
This line of code reads one frame from the camera and stores it in the pixels array.
for(int i =0; i<112;i++){
for(int j =0;j<112;j++){
tft.drawPixel(i,j,pixels[176*i+j]);
Serial.print("");
}
}
These lines of code loop through the pixels array crop a 112x112 image from it and display it on the screen.
Serial.println("");
for(int i =0; i< 28; i++)
{
for(int j =0; j < 28; j++)
{
int sum =0;
for(int k =0; k<4;k++)
{
for(int l =0; l<4; l++)
{
sum += pixels[4*(176*i+j) + 176 * k + l];
}
}
sum = sum /16;
//arr1[i*28+j] = sum;
tflInterpreter->input(0)->data.f[28*i+j] = float(sum / 255.0);
Serial.print(sum);
Serial.print(", ");
}
Serial.println("");
}
These lines of code loop through the pixels array crop a 112x112 image, reshape it into a 28x28 image, and sends it to the TinyML model.
The sketch:
//MPU6050_model.ino
#include <TensorFlowLite.h>
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"
#include "model.h"
#include <Adafruit_GFX.h> // Core graphics library
#include <Adafruit_ST7735.h> // Hardware-specific library for ST7735
#include <SPI.h>
#include <Arduino_OV767X.h>
const tflite::Model* tflModel = nullptr;
tflite::ErrorReporter* tflErrorReporter = nullptr;
TfLiteTensor* tflInputTensor = nullptr;
TfLiteTensor* tflOutputTensor = nullptr;
tflite::MicroInterpreter* tflInterpreter = nullptr;
#define TFT_CS A7
#define TFT_RST 7 // Or set to -1 and connect to Arduino RESET pin
#define TFT_DC A6
constexpr int tensorArenaSize = 140 * 1024;
uint8_t tensorArena[tensorArenaSize];
float out[10];
uint16_t pixels[176*144];
uint16_t color, pixel;
uint8_t red, blue, green;
int grayscale;
Adafruit_ST7735 tft = Adafruit_ST7735(TFT_CS, TFT_DC, TFT_RST);
void setup() {
Serial.begin(9600);
while (!Serial)
delay(10);
tft.initR(INITR_BLACKTAB);
delay(1000);
if (!Camera.begin(QCIF, RGB565, 1)) {
Serial.println("Failed to initialize camera!");
while (1);
}
Serial.println(F("Initialized"));
static tflite::MicroErrorReporter micro_error_reporter;
tflErrorReporter = µ_error_reporter;
tflModel = tflite::GetModel(model);
if (tflModel->version() != TFLITE_SCHEMA_VERSION) {
TF_LITE_REPORT_ERROR(tflErrorReporter,
"Model provided is schema version %d not equal "
"to supported version %d.",
tflModel->version(), TFLITE_SCHEMA_VERSION);
return;
}
static tflite::MicroMutableOpResolver<6> micro_op_resolver;
micro_op_resolver.AddMaxPool2D();
micro_op_resolver.AddConv2D();
micro_op_resolver.AddDepthwiseConv2D();
micro_op_resolver.AddFullyConnected();
micro_op_resolver.AddReshape();
micro_op_resolver.AddSoftmax();
static tflite::MicroInterpreter static_interpreter(tflModel, micro_op_resolver, tensorArena, tensorArenaSize, tflErrorReporter);
tflInterpreter = &static_interpreter;
TfLiteStatus allocate_status = tflInterpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
TF_LITE_REPORT_ERROR(tflErrorReporter, "AllocateTensors() failed");
return;
}
tflInputTensor = tflInterpreter->input(0);
tft.fillScreen(ST77XX_BLACK);
delay(100);
}
void loop() {
Camera.readFrame(pixels);
tft.fillScreen(ST77XX_BLACK);
for(int i =0; i<112;i++){
for(int j =0;j<112;j++){
tft.drawPixel(i,j,pixels[176*i+j]);
Serial.print("");
}
}
// delay(1000);
for(int i =0; i<112;i++){
for(int j =0;j<112;j++){
pixel = pixels[176*i +j];
red = ((pixel >> 11) & 0x1f) << 3;
green = ((pixel >> 5) & 0x3f) << 2;
blue = ((pixel >> 0) & 0x1f) << 3;
grayscale = (red + blue + green)/3 ;
if(grayscale <160){
grayscale =0;
}
pixels[176*i +j] = grayscale;
//tflInterpreter->input(0)->data.f[28*i+j] = grayscale / 255;
}
}
Serial.println("");
for(int i =0; i< 28; i++)
{
for(int j =0; j < 28; j++)
{
int sum =0;
for(int k =0; k<4;k++)
{
for(int l =0; l<4; l++)
{
sum += pixels[4*(176*i+j) + 176 * k + l];
}
}
sum = sum /16;
//arr1[i*28+j] = sum;
tflInterpreter->input(0)->data.f[28*i+j] = float(sum / 255.0);
Serial.print(sum);
Serial.print(", ");
}
Serial.println("");
}
delay(1000);
TfLiteStatus invokeStatus = tflInterpreter->Invoke();
out[0] = tflInterpreter->output(0)->data.f[0];
out[1] = tflInterpreter->output(0)->data.f[1];
out[2] = tflInterpreter->output(0)->data.f[2];
out[3] = tflInterpreter->output(0)->data.f[3];
out[4] = tflInterpreter->output(0)->data.f[4];
out[5] = tflInterpreter->output(0)->data.f[5];
out[6] = tflInterpreter->output(0)->data.f[6];
out[7] = tflInterpreter->output(0)->data.f[7];
out[8] = tflInterpreter->output(0)->data.f[8];
out[9] = tflInterpreter->output(0)->data.f[9];
float maxVal = out[0];
int maxIndex = 0;
for(int k =0; k < 10;k++){
if (out[k] > maxVal) {
maxVal = out[k];
maxIndex = k;
}
}
Serial.print("Number ");
Serial.print(maxIndex);
Serial.println(" detected");
Serial.print("Confidence: ");
Serial.println(maxVal);
}
12.a Color space of the LCD display doesn't match the ov7670
When displaying images from the live feed of the camera, a variety of color gradients pop up, I'm not entirely sure why this happens but my guess is that it's due to a loss of color space information inbetween conversions.
12.b LCDrefreshes after printing every pixel
The approach I used basically prints pixel by pixel. The problem with the Adafruit_st7735 is that it sends the buffer automatically after printing a pixel. I assume it's an easy fix to comment out the line of code, that sends the buffer, in the library.
12.cWhere is the camera pointing at
One of the major pain points while building this example was trying to figure out where the camera was pointing at. If a small 3D printed rectangular piece of plastic that helps eyeball roughly where the camera is looking will help a lot while collecting training data and testing the application.
Why this section?
You might have become confused with the thousand steps and detours to building this application so here's a list of things to simplify building your next image recognition application.
- Decide on an idea
- Decide on the components
- Collect training data
- Keep two formats of each training image( PNG file, HEX file)
- Build and train a TinyML model
- Test TinyML model
- Integrate the TinyML model into the main application
- Test application in the real world
14.a The sketch
This sketch reads a frame from the camera and outputs the RGB565 values on the serial monitor.
/*
OV767X - Camera Test Pattern
This sketch waits for the letter 'c' on the Serial Monitor,
it then reads a frame from the OmniVision OV7670 camera and
prints the data to the Serial Monitor as a hex string.
The website https://rawpixels.net - can be used the visualize the data:
width: 176
height: 144
RGB565
Little Endian
Circuit:
- Arduino Nano 33 BLE board
- OV7670 camera module:
- 3.3 connected to 3.3
- GND connected GND
- SIOC connected to A5
- SIOD connected to A4
- VSYNC connected to 8
- HREF connected to A1
- PCLK connected to A0
- XCLK connected to 9
- D7 connected to 4
- D6 connected to 6
- D5 connected to 5
- D4 connected to 3
- D3 connected to 2
- D2 connected to 0 / RX
- D1 connected to 1 / TX
- D0 connected to 10
This example code is in the public domain.
*/
#include <Arduino_OV767X.h>
unsigned short pixels[176 * 144]; // QCIF: 176x144 X 2 bytes per pixel (RGB565)
void setup() {
Serial.begin(9600);
while (!Serial);
Serial.println("OV767X Camera Capture");
Serial.println();
if (!Camera.begin(QCIF, RGB565, 1)) {
Serial.println("Failed to initialize camera!");
while (1);
}
Serial.println("Camera settings:");
Serial.print("\twidth = ");
Serial.println(Camera.width());
Serial.print("\theight = ");
Serial.println(Camera.height());
Serial.print("\tbits per pixel = ");
Serial.println(Camera.bitsPerPixel());
Serial.println();
Serial.println("Send the 'c' character to read a frame ...");
Serial.println();
}
void loop() {
if (Serial.read() == 'c') {
Serial.println("Reading frame");
Serial.println();
Camera.readFrame(pixels);
int numPixels = Camera.width() * Camera.height();
for (int i = 0; i < numPixels; i++) {
unsigned short p = pixels[i];
if (p < 0x1000) {
Serial.print('0');
}
if (p < 0x0100) {
Serial.print('0');
}
if (p < 0x0010) {
Serial.print('0');
}
Serial.print(p, HEX);
}
}
}
I thank my GSoC mentor, Paul Ruiz, for guiding me throughout the project!
Links
Comments