Published January 17, 2023 © LGPL

Image Recognition by your own learning data using TF lite

Hand Sign Recognition (Rock/Scissors/Paper) project using your own learning data using TF lite micro on Sony Spresense.

AdvancedProtip2 hours487

Image Recognition by your own learning data using TF lite

Things used in this project

Hardware components

Sony Spresense boards (main & extension)

Sony Spresense camera board

QVGA TFT SPI LCD Display (ILI9341)

USB-A to Micro-USB Cable

Jumper wires (generic)

Software apps and online services

TensorFlow

Arduino IDE

Story

Tensorflow lite/micro prepares a good example for face recognition. However, I would like to make my own image recognizer using my own dataset. So this time, I would like to generate a trained model with my own dataset (rock-paper-scissors) and have it recognized by Sony Spresense.

Spresense with LCD

Creating your own dataset

For the training dataset, I will use monochrome scaled images acquired with Spresense. There are four types of data: Background Noise (Ohters), Rock, Scissors, and Paper.

Excerpts from the "Background Noise" dataset

Excerpts from the "Rock" dataset

Excerpts from the "Scissors" dataset

Excerpts from the "Paper" dataset

List these data in CSV text, each corresponding to a path to an image and a label.

./Others/PICT1000.png,0 
./Others/PICT1001.png,0 
./Others/PICT1002.png,0 
./Others/PICT1003.png,0 
./Others/PICT1004.png,0 
./Others/PICT1005.png,0 
.... 
./Paper/PICT1747.png,1 
./Paper/PICT1748.png,1 
./Paper/PICT1749.png,1 
./Paper/PICT1750.png,1 
./Paper/PICT1751.png,1 
.... 
./Rock/PICT000.png,2 
./Rock/PICT001.png,2 
./Rock/PICT003.png,2 
./Rock/PICT004.png,2 
./Rock/PICT005.png,2 
..... 
./Scissors/PICT1697.png,3 
./Scissors/PICT1698.png,3 
./Scissors/PICT1699.png,3 
./Scissors/PICT1700.png,3 
./Scissors/PICT1701.png,3 
....

This is the same way to create a data set for evaluation.

Data set list for training: rsp_training.txt

Data set list for evaluation: rsp_valdation.txt

Output the trained model in Tensorflow

Now that the dataset is ready, the next step is to design and train a neural network in Tensorflow.

First is the importing part of the library. I also added a process to suppress messages a little.

import sys 
import tensorflow as tf 
from tensorflow import keras 
import os 

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 

# To silent verbose 
tf.autograph.set_verbosity(0) 
import logging 
logging.getLogger("tensorflow").setLevel(logging.ERROR)

The next step is to load the data set. This part of the code is a bit tricky. I just want to shuffle the data, but I have to put the images and labels into the data set array, and then put them back into the numpy array after the shuffling is finished. I think it could be done a little more elegantly, but this is a quick solution.

t_df = pd.read_csv('./dataset/training/rps_training.txt', header=None)
t_images_path = t_df.iloc[:,0]
t_labels = t_df.iloc[:,1]

v_df = pd.read_csv('./dataset/validation/rps_validation.txt', header=None)
v_images_path = v_df.iloc[:,0]
v_labels =v_df.iloc[:,1]

# read labels
t_labels = tf.convert_to_tensor(t_labels)
t_labels = tf.keras.utils.to_categorical(t_labels, 4)
v_labels = tf.convert_to_tensor(v_labels)
v_labels = tf.keras.utils.to_categorical(v_labels, 4)

# read image paths
for i in range(len(t_images_path)):
    t_images_path[i] = './dataset/training' + t_images_path[i][1:]
t_images_path = [str(path) for path in t_images_path]

for i in range(len(v_images_path)):
    v_images_path[i] = './dataset/validation' + v_images_path[i][1:]
v_images_path = [str(path) for path in v_images_path]

t_img_path_ds = tf.data.Dataset.from_tensor_slices(t_images_path)
v_img_path_ds = tf.data.Dataset.from_tensor_slices(v_images_path)

# define the function to normalize images from 0-255 to 0-1.0
def load_and_preprocess_from_path(path):
    image = tf.io.read_file(path)
    image = tf.image.decode_image(image, channels=1,expand_animations=False)
    image = tf.image.resize(image, [28, 28])
    image /= 255.0  # normalize to [0,1] range
    return image

# load image objects
#  t_images_ds: image dataset for training
#  v_images_ds: image dataset for validation
#  This process requires a normalization process for the images
AUTOTUNE = tf.data.experimental.AUTOTUNE
t_images_ds = t_img_path_ds.map(load_and_preprocess_from_path, num_parallel_calls=AUTOTUNE)
t_images_ds.element_spec
t_images_ds.cardinality()

v_images_ds = v_img_path_ds.map(load_and_preprocess_from_path, num_parallel_calls=AUTOTUNE)
v_images_ds.element_spec
v_images_ds.cardinality()

# to put labels to dataset
t_labels_ds = tf.data.Dataset.from_tensor_slices(t_labels)
v_labels_ds = tf.data.Dataset.from_tensor_slices(v_labels)

# combine datasets of images and labels
t_image_label_ds = tf.data.Dataset.zip((t_images_ds, t_labels_ds))
v_image_label_ds = tf.data.Dataset.zip((v_images_ds, v_labels_ds))

# shuffle the datasets
t_ds = t_image_label_ds.shuffle(buffer_size=len(t_image_label_ds))
v_ds = v_image_label_ds.shuffle(buffer_size=len(v_image_label_ds))

# convert datasets to numpy arrays
t_np_images = np.empty((0,28,28), dtype=float)
t_np_labels = np.empty((0,4), dtype=int)
for img, lbl in t_ds.take(len(t_ds)):
    img = img.numpy().reshape(1,28,28)
    lbl = lbl.numpy().reshape(1,4)
    t_np_images = np.append(t_np_images, img, axis=0)
    t_np_labels = np.append(t_np_labels, lbl, axis=0)

v_np_images = np.empty((0,28,28), dtype=float)
v_np_labels = np.empty((0,4), dtype=int)
for img, lbl in v_ds.take(len(v_ds)):
    img = img.numpy().reshape(1,28,28)
    lbl = lbl.numpy().reshape(1,4)
    v_np_images = np.append(v_np_images, img, axis=0)
    v_np_labels = np.append(v_np_labels, lbl, axis=0)

Define a neural network. I used a simple convolutional neural network.

model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(
      filters=6, kernel_size=(5, 5), padding='same', activation=tf.nn.relu, name="conv2d_6"), 
  keras.layers.MaxPooling2D(pool_size=(2, 2), padding='same'),
  keras.layers.Flatten(),
  keras.layers.Dense(32, activation=tf.nn.relu, name="dense_32"),
  keras.layers.Dense(4),
  keras.layers.Activation(tf.nn.softmax)
])

#model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

Training is performed on the dataset converted to numpy generated earlier.

batch_size = 32
epochs = 100

model.fit(x=t_np_images, y=t_np_labels, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)
_, test_accuracy = model.evaluate(x=v_np_images, y=v_np_labels, verbose=1)
print('test accuracy = %f' % test_accuracy)

The trained model is output to a file named "modeul.tflite".

# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Show model size in KBs.
tflite_model_size = len(tflite_model) / 1024
print('Original model size = %dKBs.' % tflite_model_size)

# Save the model to disk
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

The size of the file is 151kB, which is a bit large, so it is optimized to make it smaller. The output "model.tflite" is loaded again and optimized.

interpreter = tf.lite.Interpreter('model.tflite')
interpreter.allocate_tensors()

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()

# Show model size in KBs.
tflite_model_size = len(tflite_model) / 1024
print('Quantized model size = %dKBs.' % tflite_model_size)
# Save the model to disk
open('qmodel.tflite', "wb").write(tflite_model)

The size is reduced to 42kB. The contents of the generated trained model "qmodel.tflite" is converted to text and output as a header file "qmodel.h". Include and utilize this file from the Arduino sketch.

import binascii

def convert_to_c_array(bytes) -> str:
    hexstr = binascii.hexlify(bytes).decode("UTF-8")
    hexstr = hexstr.upper()
    array = ["0x" + hexstr[i:i + 2] for i in range(0, len(hexstr), 2)]
    array = [array[i:i+10] for i in range(0, len(array), 10)]
    return ",\n  ".join([", ".join(e) for e in array])

tflite_binary = open('qmodel.tflite', 'rb').read()
ascii_bytes = convert_to_c_array(tflite_binary)
header_file = "const unsigned char model_tflite[] = {\n  " + ascii_bytes + "\n};\nunsigned int model_tflite_len = " + str(len(tflite_binary)) + ";"
# print(c_file)
with open("qmodel.h", "w") as f:
    f.write(header_file)

Incorporating the trained models into Spresense

The following is the program code to acquire images by Spresense camera and recognize them. Copy "qmodel.h" generated earlier into the same folder as the sketch and use it.

This program code sets up Tensorflow lite/micro and activates the camera streaming function. Captured images by the camera are 320x240 YUV422 image, which needs to be converted to a 28x28 monochrome image to fit the dataset and inferred by the trained model.

#include <Camera.h>
#include "Adafruit_GFX.h"
#include "Adafruit_ILI9341.h"

#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

#include "qmodel.h"

tflite::ErrorReporter* error_reporter = nullptr;
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
int inference_count = 0;

constexpr int kTensorArenaSize = 100000;
uint8_t tensor_arena[kTensorArenaSize];

#define DNN_IMG_W 28
#define DNN_IMG_H 28
#define CAM_IMG_W 320
#define CAM_IMG_H 240
#define CAM_CLIP_X 48
#define CAM_CLIP_Y 8
#define CAM_CLIP_W 224
#define CAM_CLIP_H 224

#define TFT_RST 8
#define TFT_DC  9
#define TFT_CS  10
Adafruit_ILI9341 tft = Adafruit_ILI9341(TFT_CS ,TFT_DC ,TFT_RST);
uint16_t disp[target_w*target_h];

void disp_image(uint16_t* buf, int w, int h) {
  for (int n = 0; n < w*h; ++n) {
    uint16_t value = buf[n];
    uint16_t y_h = (value & 0xf000) >> 8;
    uint16_t y_l = (value & 0x00f0) >> 4;
    value = (y_h | y_l);       
    uint16_t value6 = (value >> 2);
    uint16_t value5 = (value >> 3);
    disp[n] = (value5 << 11) | (value6 << 5) | value5;
  }
  tft.drawRGBBitmap(0, 0, disp, w, h); 
}

void CamCB(CamImage img) {
  static uint32_t last_mills = 0;

  if (!img.isAvailable()) {
    Serial.println("img is not available");
    return;
  }
  int sx = CAM_CLIP_X;
  int sy = CAM_CLIP_Y;
  int ex = CAM_CLIP_X + CAM_CLIP_W -1;
  int ey = CAM_CLIP_Y + CAM_CLIP_H -1;
  CamImage small;
  CamErr err = img.clipAndResizeImageByHW(small, sx, sy, ex, ey, DNN_IMG_W, DNN_IMG_H);
  if (!small.isAvailable()){
    Serial.println("Clip and Resize CamImage failed (CamErr) : " + String(err));
    return false;
  }

  uint16_t* buf = (uint16_t*)small.getImgBuff();
  for (int i = 0; i < DNN_IMG_W*DNN_IMG_H; ++i) {
      uint16_t value = buf[i];
      uint16_t y_h = (value & 0xf000) >> 8;
      uint16_t y_l = (value & 0x00f0) >> 4;
      value = (y_h | y_l);      
      input->data.f[i] = (float)(value)/255.0;     
  }  

  TfLiteStatus invoke_status = interpreter->Invoke();
  if (invoke_status != kTfLiteOk) {
    Serial.println("Invoke failed");
    return;
  }

  for (int n = 0; n < 4;  ++n) {
    float value = output->data.f[n];
    Serial.println("score[" + String(n) +"] " + String(value)); 
  }
  disp_image(buf, DNN_IMG_W, DNN_IMG_H);
}


void setup() {
  Serial.begin(115200);
  tft.begin(); 
  tft.setRotation(3); 

  tflite::InitializeTarget();
  memset(tensor_arena, 0, kTensorArenaSize*sizeof(uint8_t));
  
  // Set up logging. 
  static tflite::MicroErrorReporter micro_error_reporter;
  error_reporter = &micro_error_reporter;

  // Map the model into a usable data structure..
  model = tflite::GetModel(model_tflite);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    Serial.println("Model provided is schema version " 
                  + String(model->version()) + " not equal "
                  + "to supported version "
                  + String(TFLITE_SCHEMA_VERSION));
    return;
  } else {
    Serial.println("Model version: " + String(model->version()));
  }
  // This pulls in all the operation implementations we need.
  static tflite::AllOpsResolver resolver;
  
  // Build an interpreter to run the model with.
  static tflite::MicroInterpreter static_interpreter(
      model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
  interpreter = &static_interpreter;
  
  // Allocate memory from the tensor_arena for the model's tensors.
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    Serial.println("AllocateTensors() failed");
    return;
  } else {
    Serial.println("AllocateTensor() Success");
  }

  size_t used_size = interpreter->arena_used_bytes();
  Serial.println("Area used bytes: " + String(used_size));
  input = interpreter->input(0);
  output = interpreter->output(0);

  Serial.println("Model input:");
  Serial.println("dims->size: " + String(input->dims->size));
  for (int n = 0; n < input->dims->size; ++n) {
    Serial.println("dims->data[" + String(n) + "]: " + String(input->dims->data[n]));
  }

  Serial.println("Model output:");
  Serial.println("dims->size: " + String(output->dims->size));
  for (int n = 0; n < output->dims->size; ++n) {
    Serial.println("dims->data[" + String(n) + "]: " + String(output->dims->data[n]));
  }

  Serial.println("Completed tensorflow setup");

  theCamera.begin();
  CamErr err = theCamera.startStreaming(true, CamCB);
  if (err != CAM_ERR_SUCCESS) {
    Serial.println("start streaming err: " + String(err));
    return;
  }
}

void loop() {
}

Impressions after making it

It has been a while since I used Tensorflow, but my first impression is that it consumes quite a bit of memory. Moreover, it is troublesome to create the dataset by myself. To be honest, I thought that Sony's Neural Network Console would be more memory efficient for Spresense platform, and it would be easier to handle datasets.

However, Tensorflow supports a variety of platforms, and the ability to use a wealth of assets is a major attraction. Since SPRESENSE has two solutions available, I would like to use them on a case-by-case.