This is the second part of the tutorial EdgeAI made simple - Exploring Image Processing, now doing Object Detection on microcontrollers. In the first part of this tutorial, we explored the most common task, Image Classification.
IntroductionOnly a couple of years ago, in the project Exploring AI at the Edge, I explored some of the most common machine learning algorithms applied to image Processing: Image Classification, Object Detection, and Pose Estimation.
At that time (2020), I was overwhelmed with what could be done with TensorFlow Lite, released by Google only a few months before. During that time, for my explorations at the Edge, I was happy to use a Raspberry Pi, powered by an Arm Cortex-A72, running at 1.5GHz. I never dreamed I would do Object Detection on small microcontrollers as an Arduino Portenta (Arm Cortex-M7/240Mhz) today.
Edge Impulse anounced the Object Detection tests were alheady done on even tiny microcontrollers with Cortex-M4F running at 80MHz.Object Detection versus Image Classification
The main task with Image Classification models is to produce a list of the most probable object categories present on an image, for example, my tabby cat Romeo just after his dinner:
But what happens when Romeo jumps near to my wine glass? The model still only recognizes the predominant category on the image, the tabby cat:
And what happens if there is not a dominant category on the image?
The model identifies the above image as completely wrong (an "ashcan" may be due to the color tonalities). The model used in the previous examples is the Mobilenet V1, trained with ImageNet, which we have also explored in the first part of this tutorial, classifying images on embedded devices.
In this case, we need another type of model, where not only multiple categories can be found, but also where the objects are located on a given image. As we can imagine, such models are much more complicated and bigger, for example, the MobileNetV1 SSD, trained with the COCO dataset. The below image is the result of such a model running on a Raspberry Pi:
Those kinds of models used for Object detection (MobileNet SSD, YOLO, etc.), usually have several MB in size, which is OK for use with Raspberry Pi but not suitable for use with embedded devices, where the RAM usually is lower than 1Mb
An innovative solution for Object Detection: FOMOEdge Impulse announced this week, FOMO (Faster Objects, More Objects), a novel solution to perform object detection on embedded devices, not only on the Portenta (Cortex M7), used here in this tutorial but also on Cortex M4F devices as the Arduino Nano33 and OpenMV M4 series.
A future test for FOMO could be using it also with the ESP-CAM. We can start from my tutorial exploring Image Classification on ESP-CAM.
My intention with this tutorial is to explore the use of FOMO with Object Detection, and not enter in details about the model. But I strongly advise you to dig into the official FOMO announcement, by Edge Impulse, where Louis Moreau and Mat Kelcey explain in detail how it works.
The project goal and set-upAll Machine Learning projects need to start with a detailed goal. In our case, we will detect on an image (or video), the most common (and important) tinyML devices. I choose the devices in my lab, defining each unique label.
- Nano - Arduino Nano TinyML Kit
- Wio - Seeed Wio Terminal
- Pico - Raspberry Pi Pico
- Xiao - Seeed Xiao BLE Sense
- Esp - Esp32-CAM
Edge impulse suggests that for better performance, the objects should be of similar size as in the Mat Kelcey project, Counting Bees (Mat used here a Raspberry Pi 4 and his model, not FOMO) :
Despite that, I decided also to try with mixed sizes to see the result. Still, I fixed the camera and also had the devices in a "confined space" (the green table cover) because my intention is in the future to try the Portenta with situations similar to what Edge Impulse did with bees.
We are interested in which device is in the image, its location (centroid), and how many we can find on it. The object's size is not detected with FOMO, as with MobileNet SSD or YOLO, where the Bounding Box is one of the model outputs.
Edge Impulse ProjectGo to Edge Impulse Studio, enter your credentials at Login (or create an account), and start a new project. In my case, Portenta_TinyML_Devices_Object_Detection.
My project is public, so you can clone it at EI Studio
On your Project Dashboard, go down and on Project info and select Boulding boxes (object detection).
Next, go to Arduino Portenta H7 + Vision shield and download the latest Edge Impulse firmware. A.ZIP file will be downloaded to your computer. It contains three files. Choose the correct one for your Operating System.
Double-press the RESET button on your board to put it in bootloader mode.
- Open the flash script for your operating system. In my case,
flash_mac.command,
to flash the firmware. - Wait until flashing is complete, and press the RESET button once to launch the new firmware.
Go to your project page (Data Acquisition section) at EI Studio, and using webUSB, connect your Portenta:
Start collecting images from your devices. The images will be saved on the tab "Labeling queue".
In my case, I captured around 100 images, some with devices alone and others with several in the same image. Once I got all the images, I moved to the Labelling queue tab and started selecting the objects into images, associating them with their respective label.
With FOMO it is not possible to detect objects with overlapping centroids. It is possible, though, to increase the resolution of the image (or to decrease the heat map factor) to reduce this limitation. Always keep a distance among them.
Spare some data for future tests once you have your train dataset ready.
Create Impulse and Generate featuresAn impulse takes raw data captured by Portenta in the 320x320 pixels images, cropping them for 96x96 for optimal accuracy with the Transfer Learning Model.
The cropping is the only preprocessing that our input images will suffer once the images are already in grayscale.
Save the parameters and generate the features. Then, take a look at the feature explorer:
As expected, by applying UMAP for reducing dimensions, we can confirm that samples are visually easily classified, which is an excellent sign that the model should work well.
Training and TestFor training, we should select a pre-trained model. Let's use the FOMO (Faster Objects, More Objects) MobileNetV2 0.35. This model uses around 250K RAM and 80K of ROM (Flash), which suits well with our board once it has 1MB of RAM and 2MB of ROM.
Regarding the training hyper-parameters, the model will be trained with 100 epochs and a learning rate 0.001.
For validation during training, will be spared 20% of the dataset (validation_dataset). For the remaining 80% (train_dataset), we will apply Data Augmentation, which randomly will flip, change the size and brightness of the image, and crop them. We artificially increase the number of samples on the dataset for training.
As a result, the model ends with practically 95% accuracy.
Note that a 6th label was automatically added to the five previously defined, "background".
Test model with "Live Classification"Let's do live tests once the Portenta is still paired with Edge Impulse Studio. One thing I noted, is that the model can produce some false positives, that can be minimized, defining a proper Confidence Threshold. Try with 0.8 or more.
Here is the result, setting the threshold at 0.9.
Disconnect the Portenta from Edge Impulse Studio and open your OpenMV IDE. Start putting the Portenta in bootloader mode by double-pressing the reset button on the board. The built-in green LED will start fading in and out. Now return to the OpenMV IDE and click on the connect icon (Left ToolBar):
A pop-up will tell you that a board in DFU mode was detected and ask you how you would like to proceed. First, select "Install the latest release firmware." This action will install the latest OpenMV firmware on the Portenta H7. You can leave the option of erasing the internal file system unselected and click "OK."
Portenta H7's green LED will start flashing while the OpenMV firmware is uploaded to the board. Then, a terminal window will then open, showing the flashing progress.
Wait until the green LED stops flashing and fading. When the process ends, you will see a message saying, "DFU firmware update complete!".
A green play button appears when the Portenta H7 connects to the Tool Bar.
When clicking the green play button, the Micropython script (hellowolrd.py) on the Code Area will be uploaded and run on the Portenta. On-Camera Viewer, you will start to see the video streaming. The Serial Monitor will show us the FPS (Frames per second), which should be over 60fps.
Now that we have the Portenta connected to the OpenMV IDE, let's now return to Edge Impulse Studio, and on the Deployment section, select OpemMV Library:
And at the bottom of the section, press the button Build:
A zip file will be downloaded to your computer.
Unzip the file. You should find three files on it:
- ei_object_detection.py (a template to be used for object detection)
- labels.txt (a text file with the six labels used during training)
- trained.tflite (the FOMO trained model)
Move those files to your Portenta (it should appear as a NO NAME drive:
Click on the Python script ei_object_detection.py. It should be opened on your OpenMV IDE.
Start configuring the camera (as a default, the script shows an RGB camera, but it should be configured for the Portenta
# Configure camera
sensor.reset()
sensor.set_pixformat(sensor.GRAYSCALE)
sensor.set_framesize(sensor.B320X320)
sensor.set_windowing((240, 240))
sensor.skip_frames(time = 2000)
net = None
labels = None
To start, we should define the minimum confidence, for example, 0.8
min_confidence = 0.8
Depending on how your camera is mounted, you could flip the image for the best results (I mirrored my camera on the vertical axis):
img = sensor.snapshot()
#img.set(h_mirror=True)
img.set(v_mirror=True)
After those changes, press the green Play button at the lower left corner of OpenMV IDE:
Looking for the following device distribution on my desk:
On the camera view, we can see the devices with their centroids marked with 12 pixel-fixed circles (each circle has a distinct color, but it is barely visualized on grayscale images). On the Serial Terminal, the model shows the labels detected and their position on the image window (240X240).
Note that the frames per second rate is down to around seven fps (same as we got with Image Classification in part 1 of this tutorial. This is because FOMO is cleverly built over a CNN model, not with an object detection model like the SSD MobileNet. For example, running a MobileNetV2 SSD FPN-Lite 320x320 model on a Raspberry Pi 4, I got only 1.5 fps
Another important point to note, is that if we also print the raw model output tensor:
print(net.detect(img, thresholds=[(math.ceil(min_confidence * 255), 255)]))
We will get a fixed list with six elements, that, for the above image, has the following format:
[
[{"x":0, "y":0, "w":240, "h":240, "output":0.996078}],
[{"x":60, "y":140, "w":20, "h":20, "output":0.988235},
{"x":120, "y":160, "w":20, "h":20, "output":0.992157}],
[{"x":60, "y":100, "w":20, "h":20, "output":0.909804}],
[{"x":140, "y":20, "w":20, "h":20, "output":0.92549}],
[{"x":40, "y":40, "w":20, "h":20, "output":1.0}],
[{"x":140, "y":100, "w":20, "h":20, "output":0.941176},
{"x":120, "y":120, "w":20, "h":20, "output":0.992157}]
]
Where each one of the list elements will correspond to one of the labels. Note that index = 0, is the background label, followed by all labels:
The list always will have six elements, even if fewer objects are detected. In such a case, the element will be empty:
Here is the full code:
# Edge Impulse - OpenMV Object Detection Example
import sensor, image, time, os, tf, math, uos, gc
# Configure camera
sensor.reset()
sensor.set_pixformat(sensor.GRAYSCALE) # Set pixel format to GRAYSCALE
sensor.set_framesize(sensor.B320X320) # 320x320 resolution for the HM01B0 camera sensor
sensor.set_windowing((240, 240)) # Crop sensor frame to model resolution
sensor.skip_frames(time = 2000) # Let the camera adjust
net = None
labels = None
min_confidence = 0.9
try:
# load the model, alloc the model file on the heap if we have at least 64K free after loading
net = tf.load("trained.tflite", load_to_fb=uos.stat('trained.tflite')[6] > (gc.mem_free() - (64*1024)))
except Exception as e:
raise Exception('Failed to load "trained.tflite", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
try:
labels = [line.rstrip('\n') for line in open("labels.txt")]
except Exception as e:
raise Exception('Failed to load "labels.txt", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
colors = [ # Add more colors if you are detecting more than 7 types of classes at once.
(255, 0, 0),
( 0, 255, 0),
(255, 255, 0),
( 0, 0, 255),
(255, 0, 255),
( 0, 255, 255),
(255, 255, 255),
]
clock = time.clock()
while(True):
clock.tick()
img = sensor.snapshot()
#img.set(h_mirror=True)
img.set(v_mirror=True)
# detect() returns all objects found in the image (splitted out per class already)
# we skip class index 0, as that is the background, and then draw circles of the center
# of our objects
print(net.detect(img, thresholds=[(math.ceil(min_confidence * 255), 255)]))
for i, detection_list in enumerate(net.detect(img, thresholds=[(math.ceil(min_confidence * 255), 255)])):
if (i == 0): continue # background class
if (len(detection_list) == 0): continue # no detections for this class?
print("********** %s **********" % labels[i])
for d in detection_list:
[x, y, w, h] = d.rect()
center_x = math.floor(x + (w / 2))
center_y = math.floor(y + (h / 2))
print('x %d\ty %d' % (center_x, center_y))
img.draw_circle((center_x, center_y, 12), color=colors[i], thickness=2)
print(clock.fps(), "fps", end="\n\n")
ConclusionFOMO is an important leap in the image processing space, as Louis Moreau and Mat Kelcey put it nicely yesterday:
FOMO is a ground-breaking algorithm that brings real-time object detection, tracking, and counting to microcontrollers for the first time.
I am excited about the possibilities of exploring object detection (and, more precisely, counting them) on embedded devices.
I plan to explore Portenta doing sensor fusion (camera + microphone) and object detection. For example, how about detecting dangerous Aedes aegypti mosquitos, by the sound of their wing beats and counting how many are present on a sample image?
I hope this project can help others find their way into the exciting world of AI and Electronics!
link: MJRoBot.org
Greetings from the south of the world!
See you at my next project!
Thank you
Marcelo
Comments