Deep Learning Workflow
Convert the Model
Run Inference
Fundamentals - Person Detection
Visual Wake Words Dataset
Application Architecture
Deploying to Microcontrollers
Acknowledgments
References

Published June 6, 2022 © GPL3+

Embedded Machine Learning for Person Detection

Is easy to run large Machine Learning models in the cloud with large computational and memory resources. What about in the Arduino Nano?

IntermediateShowcase (no instructions)2,843

Embedded Machine Learning for Person Detection

Things used in this project

Hardware components

Arduino Nano 33 BLE Sense

Arducam Mini 2MP Plus – OV2640

Jumper wires (generic)

Software apps and online services

Arduino IDE

TensorFlow

Story

Deep Learning Workflow

In this project, the steps followed to implement our person detection application are the book referenced [1]:

Decide the goal –person detection using Arduino Nano BLE 33 and Arducam

Collect a dataset – VisualWake Words (derived from COCO dataset)

Design a ModelArchitecture – Convolutional Neural Networks – MobileNet v1

Train the Model – Training, validation and test

Convert the Model –using TensorFlow Lite Converter

Run Inference – Output data – Person Detected and Not detected scores, green/red light

Evaluate andTroubleshoot – Real world performance

Convert the Model

In a normal environment, TensorFlow is used to build and train large Machine Learning models. A TensorFlow model is a set of instructions that tell an interpreter how to transform data in order to produce an output. To use our model, we load into the memory and execute it using the TensorFlow interpreter.

TensorFlow interpreter is supposed to be run in powerful desktop computers and servers. To run models on microcontrollers, we need a different interpreter for this case.TensorFlow has an interpreter and tools to run models on small-low system powered devices and it’s called “TensorFlow Lite”.

Run Inference

After the model is converted, it can be deployed using the library “TensorFlow Lite forMicrocontrollers C++”. The code takes raw input data from the sensors and transforms it into the same form that the model was trained. This transformed data is passed to the model and run inference.

This result is an output containing predictions. In case of our “person detection” model, the output will be a score for each of our classes “person” and “no person”. These models that classify data, typically the scores for all classes will sum to 1, and the class with the highest score will be the prediction. The higher the difference between the scores, the higher the confidence in the prediction. In our application we are using a score from -100 to +100

Each individual inference takes into account only a snapshot of the data, it tells us the probability of a person being detected within the last 1 second, based on the camera input image capture.

Higher "no person" score

Fundamentals - Person Detection

When a person is detected, the hardware will light a green LED, when there is no person, a red LED will stay on. This can be extended for any other type of control. A person/not-person use-case is checking whether a person is present or not. Sensing person/not-person can be used in many applications including smart homes, retail, and smart buildings [2]. The inference result could be used in various IoT devices enabling other devices to start or trigger alerts. Generalizing, low-cost vision sensors can be deployed to sense the present of specific objects like pets in a home or cars in a garage

Visual Wake Words Dataset

The Visual Wake WordsData set was obtained by re-labeling the images of the COCO dataset with labels to whether the object-of-interest is present or not [2]. COCO dataset is publicly available with natural images of everyday scenes that contain multiple objects [3]. The new labels were created based on the following criteria:

Label 1 – at least one bounding box corresponding to the object (e.g. person)

Label 0 – not containing any objects from the class (e.g. person)

The TensorFlow command used to generate new annotations can be modified to classify the presence of any object of interest from the COCO categories.

Application Architecture

Our embedded machine learning application for person detection follow the sequence [1]:

- Obtain an input (camera)

- Preprocess the input to extract features suitable to feed into the model

- Run inference of the processed input

- Post process the models’s output

- Use the resulting information to act(light up different LEDs)

Image data is represented as an array of pixel values. We obtain our image data from the embedded camera module (Arducam Mini 2MP Plus)that also provide data in this format so not so much preprocessing is required.It will require the conversion from colored image to grayscale. The application takes a snapshot of data from the camera, feeds into the model and finds out which output class was detected, then displays the results in the serial monitor and output LED.

The figure below shows the structure of the person detection application:

Architecture of the person detection application

This person detection model uses the MobileNet architecture trained on the VisualWake Words dataset. The model takes a 96x96 pixel grayscale image as input, each image is a 3D tensor with shape (96, 96, 1). The final dimension (1) is the value that represents a single pixel raging from 0 (black) and 255 (white).

The models outputs two probabilities ranging from -100 to +100: one indicating that a person was present in the input, and another that there was no person present.

The program is divided in five main parts:

Main Loop– embedded applications run in a continous loop

Image provider– captures the image from the camera and writes in the input tensor

TensorFlow Lite interpreter– runs the TensorFlow Lite model, trainforming the input image into the set ofprobabilities

Model – is a data arrayand run by the interpreter

Detection responder– takes the probabilities output and uses to display them

Deploying to Microcontrollers

In this session we will show how to deploy the code into the selected device Arduino Nano 33 BLESense. Arduino has a great variety of compatible third-party hardware and libraries and we used the Arducam Mini 2MPPlus [6] camera in this project. We connect the camera module to the Arduino board via the following pins:

Green LED - person detected

Red LED - no person detected

Acknowledgments

This project was presented for the course AIE-604 Deep Learning Applications by Dr. Mohab Mangoud.

References

[1] P. Warden and D. Situnayake, TinyML - Machine Learning withTensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers, O'Reilly, 2019.

[2] A. Chowdhery, P. Warden, J. Shlens, A. Howard and R. Rhodes, "Visual Wake Words Dataset, " Google Research, 2019.

[3] "COCO - Common Objects in Context, " [Online]. Available: https://cocodataset.org/#home. [Accessed May 2022].

Embedded Machine Learning for Person Detection