In this project, the steps followed to implement our person detection application are the book referenced [1]:
- Decide the goal –person detection using Arduino Nano BLE 33 and Arducam
- Collect a dataset – VisualWake Words (derived from COCO dataset)
- Design a ModelArchitecture – Convolutional Neural Networks – MobileNet v1
- Train the Model – Training, validation and test
- Convert the Model –using TensorFlow Lite Converter
- Run Inference – Output data – Person Detected and Not detected scores, green/red light
- Evaluate andTroubleshoot – Real world performance
In a normal environment, TensorFlow is used to build and train large Machine Learning models. A TensorFlow model is a set of instructions that tell an interpreter how to transform data in order to produce an output. To use our model, we load into the memory and execute it using the TensorFlow interpreter.
TensorFlow interpreter is supposed to be run in powerful desktop computers and servers. To run models on microcontrollers, we need a different interpreter for this case.TensorFlow has an interpreter and tools to run models on small-low system powered devices and it’s called “TensorFlow Lite”.
Run InferenceAfter the model is converted, it can be deployed using the library “TensorFlow Lite forMicrocontrollers C++”. The code takes raw input data from the sensors and transforms it into the same form that the model was trained. This transformed data is passed to the model and run inference.
This result is an output containing predictions. In case of our “person detection” model, the output will be a score for each of our classes “person” and “no person”. These models that classify data, typically the scores for all classes will sum to 1, and the class with the highest score will be the prediction. The higher the difference between the scores, the higher the confidence in the prediction. In our application we are using a score from -100 to +100
Each individual inference takes into account only a snapshot of the data, it tells us the probability of a person being detected within the last 1 second, based on the camera input image capture.
When a person is detected, the hardware will light a green LED, when there is no person, a red LED will stay on. This can be extended for any other type of control. A person/not-person use-case is checking whether a person is present or not. Sensing person/not-person can be used in many applications including smart homes, retail, and smart buildings [2]. The inference result could be used in various IoT devices enabling other devices to start or trigger alerts. Generalizing, low-cost vision sensors can be deployed to sense the present of specific objects like pets in a home or cars in a garage
Visual Wake Words DatasetThe Visual Wake WordsData set was obtained by re-labeling the images of the COCO dataset with labels to whether the object-of-interest is present or not [2]. COCO dataset is publicly available with natural images of everyday scenes that contain multiple objects [3]. The new labels were created based on the following criteria:
- Label 1 – at least one bounding box corresponding to the object (e.g. person)
- Label 0 – not containing any objects from the class (e.g. person)
The TensorFlow command used to generate new annotations can be modified to classify the presence of any object of interest from the COCO categories.
Application ArchitectureOur embedded machine learning application for person detection follow the sequence [1]:
- - Obtain an input (camera)
- - Preprocess the input to extract features suitable to feed into the model
- - Run inference of the processed input
- - Post process the models’s output
- - Use the resulting information to act(light up different LEDs)
Image data is represented as an array of pixel values. We obtain our image data from the embedded camera module (Arducam Mini 2MP Plus)that also provide data in this format so not so much preprocessing is required.It will require the conversion from colored image to grayscale. The application takes a snapshot of data from the camera, feeds into the model and finds out which output class was detected, then displays the results in the serial monitor and output LED.
The figure below shows the structure of the person detection application:
This person detection model uses the MobileNet architecture trained on the VisualWake Words dataset. The model takes a 96x96 pixel grayscale image as input, each image is a 3D tensor with shape (96, 96, 1). The final dimension (1) is the value that represents a single pixel raging from 0 (black) and 255 (white).
The models outputs two probabilities ranging from -100 to +100: one indicating that a person was present in the input, and another that there was no person present.
The program is divided in five main parts:
- Main Loop– embedded applications run in a continous loop
- Image provider– captures the image from the camera and writes in the input tensor
- TensorFlow Lite interpreter– runs the TensorFlow Lite model, trainforming the input image into the set ofprobabilities
- Model – is a data arrayand run by the interpreter
- Detection responder– takes the probabilities output and uses to display them
In this session we will show how to deploy the code into the selected device Arduino Nano 33 BLESense. Arduino has a great variety of compatible third-party hardware and libraries and we used the Arducam Mini 2MPPlus [6] camera in this project. We connect the camera module to the Arduino board via the following pins:
This project was presented for the course AIE-604 Deep Learning Applications by Dr. Mohab Mangoud.
References[1] P. Warden and D. Situnayake, TinyML - Machine Learning withTensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers, O'Reilly, 2019.
[2] A. Chowdhery, P. Warden, J. Shlens, A. Howard and R. Rhodes, "Visual Wake Words Dataset, " Google Research, 2019.
[3] "COCO - Common Objects in Context, " [Online]. Available: https://cocodataset.org/#home. [Accessed May 2022].
Comments