Since the dawn of human existence, birds have represented an outsized challenge to anyone hoping to track their types and migrations. We have come a long way from the days when tracking the movements of birds might be integral to our survival, but we still have a great scientific interest in cataloging their migratory habits.
Birds connect habitats, resources and biological processes. They also contribute to so-called ecosystem services – as natural enemies of pests, pollinators of fruit, seed transporters or “garbage police”. On my window, in the morning there are few types of birds. An idea came to me to build an automatic recognition using AI on the edge. The informations I get can be used to know what types of birds are the most common in the specific time of the year (e.g. bluejay in the summer), what type of bird is most likely to show on my windows during morning, etc.
As a part of Vision Challenge - Smart Eyes on MCU I used Grove - Vision AI Module V2delivered from Seed Studio. For this project, you don’t need to have big and expensive hardware, but thanks to Seed Studio and the small dimensions of Grove Vision AI Module V2 and it’s power it is possible to do amazing and useful things.
Project goal
The project goal is to determine what type of bird from the set of {bluejay, cardinal, titmouse}.
In other words, for this case, the goal is to detect and classify three specific objects present in one image.
Firstly, the setup is very simple: you need Grove AI with camera.
Second, you need to have data, images of specific type of birds. I used Grove AI provided camera to take samples. I collected 102 samples.
In order to train my model, I used Edge Impulse Studio. It is web based platform for machine learning on edge devices, and it is totally GUI friendy.
The dataset has in total 102 images, with 6 classes: bluejay, cardinal, mark, testing, titmouse, unlabeled.
Impulse Design and Pre-ProcessingImpulse Design
An impulse takes raw data (in this case, images), extracts features (resize pictures), and then uses a learning block to classify new data.
Classifying images is the most common use of deep learning. With TL, we can fine-tune a pre-trained image classification model on our data, performing well even with relatively small image datasets (our case). TL has very good performance on low power and constrained devices.
Firstly, starting from the raw images, we will resize them (96x96) pixels and send them as an input to our Transfer Learning block:
On the image config under impulse design I selected RGB for color depth. By doing that, each data sample will have a dimension of 27, 648 features (96x96x3).
I used MobileNet architecture, because it has tiny and low latency features and have good performance on Edge devices.
For our project, I used the MobileNetV2 0.35 model. The final layer of our model (before the output layer) I have 16 neurons with a 10% dropout for overfitting prevention, as you can see in the Transfer learning option under Impulse design.
MobileNetV2 96x96 0.35 (final layer: 16 neurons, 0.1 dropout)
Another additional unavoidable technique to use with deep learning is data augmentation. Data augmentation is the process of artificially generating new data from existing data, primarily to train new machine learning (ML) models.A data augmentation system makes small, random (but realistic) transformations to your training data during the training process (such as flipping, cropping, or rotating the images). This is used because in real conditions, you have imperfect camera, noise, rotations, etc. So using data augmentations at the end “simulates” these environmental real conditions in order to get better results at the end. The parameters for training are set as in the image below:
I have input layer with 27648 features, and output layer with 6 classes.
The training result is as below:
The most important factor is interferencing time, the better it is, the object detection is faster. You can see that the maximum ram usage is 334K and flash usage (size of exported model in MCU) us 585K. Grove AI Vision (V2), has around 2.5MB of internal SRAM which is more than enough for this project.
The result is very good: 93.8 % accuracy.
After training, under dashboard, it is possible to download TensorFlow Lite data model:
Now, I will convert this model into an optimized version that can run on an embedded system with an Arm Ethos-U 55 NPU. The conversion is done by using Vela tool.
I am using Ubuntu Linux, and you can install Vela tool by typing:
pip install ethos-u-vela
To confirm vela is installed, type:
vela --version
The response is:
['3.12.0']
Now, after Vela installation, we will convert the moded.tflite, which we downloaded from EdgeImpulse:
vela model.tflite --accelerator-config ethos-u55-64
The output file we get is model_vela.tflite.
The last model we get should be now deployed to Grove Vision. I used SenseCraft Web-Toolkit, because it provides simple camera preview result in web environment and also Device Log with a Serial Monitor. Device Log provides output with class (output tensor), in our case detected bird type.
This project can be extended, as I said in the beginning, with various reports, such as what types of birds are the most common in the specific time of the year or what type of bird is most likely to show on my window during morning, etc., and those can be send via MQTT by coupling small Seedstuduios Xiao board which will serve to count / analyze things and also send informations via MQTT protocol for further analysis.
To conclude, Seeed Studio Grove Vision AI Module V2is a powerful and tiny device which can be used for serious applications in embedded machine learning.
The demonstration is shown below:
Comments