In this project I am going to build an object detection application using M5Stack UnitV2. The M5Stack UnitV2 is a small self-contained package consisting of a GC2145 1080P camera module + SoC based on dual-core Cortex-A7 1.2GHz Sigmastar SSD202D, integrated 128MB-DDR3 RAM, and 512MB NAND flash. Also, it has an onboard microphone and WiFi chip. The module comes with a preloaded Linux OS (linux-chenxing) on a 16 GB MicroSD card.
The internals looks similar to the image below. It has a heatsink and cooling fan for thermal dissipation.
M5Stack has already written a quick-start guide and a comprehensive documentation (https://docs.m5stack.com/en/unit/unitv2) to use the module for transfer learning the pre-trained models and classification/detection tasks using a web interface over WiFi. Please visit the above link for the initial setup and instructions for accessing command line interface using ssh over WiFi which we will be using in this project. In this project I have used Edge Impulse Studio to train a model and deployed it to the M5Stack UnitV2.
Development Environment SetupSince the M5Stack UnitV2 has 128MB RAM only and the preloaded Linux distribution is bare-minimum and does not come with a GCC compiler so we would need another Linux machine to do cross-compilation. I am using a Ubuntu 20.04 VM using VirtualBox at a MacOS host.
Please execute following commands to download specific version of the GCC ARM compiler for the target device.
$ sudo apt install wget
$ wget -O gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz "https://developer.arm.com/-/media/Files/downloads/gnu-a/10.2-2020.11/binrel/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz?revision=d0b90559-3960-4e4b-9297-7ddbc3e52783&la=en&hash=985078B758BC782BC338DB947347107FBCF8EF6B”
$ tar -xf gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz
We will use Edge Impulse Studio for data collection, training and build a TensorFlow Lite model. We need to create an account and create a new project at https://studio.edgeimpulse.com.
Data CollectionWe are going to detect Glasses and Bottles so we need to collect few images. I used my mobile phone to capture images and uploaded to the Edge Impulse Studio. In the Data Acquisition page click at the Show Options link and select the Use your mobile phone option to connect to the Edge Impulse Studio by scanning a QR code and we can start capturing and uploading images to the project associated with that QR code.
After uploading the images we need to annotate them. Choose Edit Labels from the dropdown menu for any Collected data row and draw rectangular boxes to label it as shown in the GIF below.
In the Impulse Design > Create Impulse page, we can add a processing block and learning block. We have chosen Image preprocessing block which preprocess and normalize image data, and optionally reduce the color depth and for learning block. We have chosen Object Detection (Images) which fine-tunes a pre-trained object detection model on our data and may have good performance even with relatively small image datasets..
Now we need to generate feature in the Impulse Design > Image page. We can go with the default parameters.
After clicking on the Save Parameters button the page will redirect to the Generate Features page where we can start generating features which would take few minutes. After feature generation we can see the output in the Feature Explorer.
Now we can go to the Impulse Design > Object Detection page where we can start training the MobileNetV2 SSD FPN-Lite 320x320 model which will take couple of minutes to finish.
After training is completed, we can see the precision score which is 90.5% and which is not bad for such a small number of training data.
We can test the performance of the trained model on test data. In the Model Testing page click on Classify all button to run it on all testing data. The accuracy score is 100% so the model is doing a great job.
We will use Edge Impulse Linux SDK for C++ for building the application. We need to make few changes in the installation scripts and main application code for the target device. The updated repository (a fork of the Edge Impulse GitHub repo) can be cloned at host machine.
$ git clone https://github.com/metanav/example-standalone-inferencing-linux.git
We need to download C++ library bundle from the Edge Impulse Studio Deployment page by selecting Create Library > C++ Library and click at the Build button. After building is finished, it will be downloaded at the local computer and we can unzip and move the library bundle to the example-standalone-inferencing-linux directory.
Inferencing Codeexample-standalone-inferencing-linux/source/camera.cpp
#include <unistd.h>
#include "opencv2/opencv.hpp"
#include "iostream"
#include "opencv2/videoio/videoio_c.h"
#include "nadjieb/mjpeg_streamer.hpp"
#include "edge-impulse-sdk/classifier/ei_run_classifier.h"
using MJPEGStreamer = nadjieb::MJPEGStreamer;
static bool use_debug = false;
static float features[EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT];
/**
* Resize and crop to the set width/height from model_metadata.h
*/
void resize_and_crop(cv::Mat *in_frame, cv::Mat *out_frame) {
// to resize... we first need to know the factor
float factor_w = static_cast<float>(EI_CLASSIFIER_INPUT_WIDTH) / static_cast<float>(in_frame->cols);
float factor_h = static_cast<float>(EI_CLASSIFIER_INPUT_HEIGHT) / static_cast<float>(in_frame->rows);
float largest_factor = factor_w > factor_h ? factor_w : factor_h;
cv::Size resize_size(static_cast<int>(largest_factor * static_cast<float>(in_frame->cols)),
static_cast<int>(largest_factor * static_cast<float>(in_frame->rows)));
cv::Mat resized;
cv::resize(*in_frame, resized, resize_size);
int crop_x = resize_size.width > resize_size.height ?
(resize_size.width - resize_size.height) / 2 :
0;
int crop_y = resize_size.height > resize_size.width ?
(resize_size.height - resize_size.width) / 2 :
0;
cv::Rect crop_region(crop_x, crop_y, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT);
if (use_debug) {
printf("crop_region x=%d y=%d width=%d height=%d\n", crop_x, crop_y, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT);
}
*out_frame = resized(crop_region);
}
int main(int argc, char** argv) {
// If you see: OpenCV: not authorized to capture video (status 0), requesting... Abort trap: 6
// This might be a permissions issue. Are you running this command from a simulated shell (like in Visual Studio Code)?
// Try it from a real terminal.
//
printf("OPENCV VERSION: %d_%d\n", CV_MAJOR_VERSION, CV_MINOR_VERSION);
if (argc < 2) {
printf("Requires one parameter (ID of the webcam).\n");
printf("You can find these via `v4l2-ctl --list-devices`.\n");
printf("E.g. for:\n");
printf(" C922 Pro Stream Webcam (usb-70090000.xusb-2.1):\n");
printf(" /dev/video0\n");
printf("The ID of the webcam is 0\n");
exit(1);
}
for (int ix = 2; ix < argc; ix++) {
if (strcmp(argv[ix], "--debug") == 0) {
printf("Enabling debug mode\n");
use_debug = true;
}
}
MJPEGStreamer streamer;
streamer.start(80);
std::vector<int> params = {cv::IMWRITE_JPEG_QUALITY, 90};
// open the camera...
cv::VideoCapture camera(atoi(argv[1]));
if (!camera.isOpened()) {
std::cerr << "ERROR: Could not open camera" << std::endl;
return 1;
}
std::cout << "Resolution: " << camera.get(CV_CAP_PROP_FRAME_WIDTH)
<< "x" << camera.get(CV_CAP_PROP_FRAME_HEIGHT) << std::endl;
std::cout << "EI_CLASSIFIER_INPUT_WIDTH: " << EI_CLASSIFIER_INPUT_WIDTH
<< " EI_CLASSIFIER_INPUT_HEIGHT: " << EI_CLASSIFIER_INPUT_HEIGHT << std::endl;
if (use_debug) {
// create a window to display the images from the webcam
cv::namedWindow("Webcam", cv::WINDOW_AUTOSIZE);
}
// this will contain the image from the webcam
cv::Mat frame;
// display the frame until you press a key
while (1) {
// 100ms. between inference
int64_t next_frame = (int64_t)(ei_read_timer_ms() + 100);
// capture the next frame from the webcam
camera >> frame;
cv::Mat cropped;
resize_and_crop(&frame, &cropped);
size_t feature_ix = 0;
for (int rx = 0; rx < (int)cropped.rows; rx++) {
for (int cx = 0; cx < (int)cropped.cols; cx++) {
cv::Vec3b pixel = cropped.at<cv::Vec3b>(rx, cx);
uint8_t b = pixel.val[0];
uint8_t g = pixel.val[1];
uint8_t r = pixel.val[2];
features[feature_ix++] = (r << 16) + (g << 8) + b;
}
}
ei_impulse_result_t result;
// construct a signal from the features buffer
signal_t signal;
numpy::signal_from_buffer(features, EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT, &signal);
// and run the classifier
EI_IMPULSE_ERROR res = run_classifier(&signal, &result, false);
if (res != 0) {
printf("ERR: Failed to run classifier (%d)\n", res);
return 1;
}
#if EI_CLASSIFIER_OBJECT_DETECTION == 1
printf("Classification result (%d ms.):\n", result.timing.dsp + result.timing.classification);
bool found_bb = false;
for (size_t ix = 0; ix < EI_CLASSIFIER_OBJECT_DETECTION_COUNT; ix++) {
auto bb = result.bounding_boxes[ix];
if (bb.value == 0) {
continue;
}
cv::rectangle(cropped, cv::Point(bb.x, bb.y), cv::Point(bb.x+bb.width, bb.y+bb.height), cv::Scalar(255,255,255), 2);
cv::putText(cropped, bb.label, cv::Point(bb.x, bb.y-5), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0,0,0), 1);
found_bb = true;
printf(" %s (%f) [ x: %u, y: %u, width: %u, height: %u ]\n", bb.label, bb.value, bb.x, bb.y, bb.width, bb.height);
}
if (!found_bb) {
printf(" no objects found\n");
}
#else
printf("(DSP+Classification) %d ms.\n", result.timing.dsp + result.timing.classification);
size_t ix_max = -1;
float max_value = 0.f;
for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
if (result.classification[ix].value > max_value) {
max_value = result.classification[ix].value;
ix_max = ix;
}
//printf("%s: %.05f", result.classification[ix].label, result.classification[ix].value);
//if (ix != EI_CLASSIFIER_LABEL_COUNT - 1) {
// printf(", ");
//}
}
//printf("\n");
char text[30];
sprintf(text, "%s: %.2f", result.classification[ix_max].label, result.classification[ix_max].value);
#endif
// show the image on the window
if (use_debug) {
cv::imshow("Webcam", cropped);
// wait (10ms) for a key to be pressed
if (cv::waitKey(10) >= 0)
break;
}
int64_t sleep_ms = next_frame > (int64_t)ei_read_timer_ms() ? next_frame - (int64_t)ei_read_timer_ms() : 0;
if (sleep_ms > 0) {
usleep(sleep_ms * 1000);
}
if (streamer.isAlive()) {
std::vector<uchar> buff_bgr;
cv::imencode(".jpg", cropped, buff_bgr, params);
streamer.publish("/stream", std::string(buff_bgr.begin(), buff_bgr.end()));
} else {
printf("streamer is mot alive!\n");
}
}
streamer.stop();
return 0;
}
#if !defined(EI_CLASSIFIER_SENSOR) || EI_CLASSIFIER_SENSOR != EI_CLASSIFIER_SENSOR_CAMERA
#error "Invalid model for current sensor."
#endif
Cross-compilationWe will be streaming the inferencing results over HTTP which can be accessed using a computer or smartphone web browser. We need to clone cpp-mjpeg-streamer library repository which will be compiled and linked to the main application. Also, we would need to download and install OpenCV dependencies.
$ cd example-standalone-inferencing-linux
$ git clone https://github.com/nadjieb/cpp-mjpeg-streamer.git
$ ./build-opencv-linux.sh
The command below would cross-compile the main application and create an executable in the build directory.
$ APP_CAMERA=1 TARGET_LINUX_ARMV7=1 USE_FULL_TFLITE=1 make -j
On-device InferencingWe have to copy the compiled executable to the target device. For some reason the executable runs correctly with root user only so we will copy it using the root login credentials.
$ scp build/camera root@10.254.239.1:/home/m5stack
Now login to the M5Stack UnitV2 as a root user and execute the following commands to run the application.
$ ssh root@10.254.239.1
$ cd /home/m5stack
$ ./camera 0
We can now access the inferencing output stream at http://10.254.239.1/stream.
Live DemoConclusionThis project showcases the capabilities of a small and portable battery-operated camera module which is suitable for real-time Computer Vision model inferencing. The module also has a grove connector which can be used for UART communication with other microcontrollers to control robotics systems. Also, using the Edge Impulse Studio, it is really easy to train and deploy custom models which would expand the usability of this module.
Comments