In this article we will have a look at the new computer vision modules lineup from OpenCV and Luxonis - OAK-1 and OAK-D. OAK stands for OpenCV AI Kit - it is an MIT-licensed open source software and Myriad X-based hardware solutions for computer vision tasks. The team behind OAK has just successfully finished a Kickstarter campaign, getting more than $1, 358, 000 from backers around the world, including me.
When a freelance client of mine forwarded me a link to Kickstarter campaign, I naturally was interested in backing and reviewing OAK boards - after all machine learning on the edge and robotics are two main topics my channel/blog is focused on.
Today we’re going to have a look at OAK-1, a pre-production prototype of which I received from Luxonis/OpenCV for review. We will have a look at the specs, try pre-trained models from Model Zoo and convert our own model to the format that can be used for fast inference with the module. If you’re interested to know more about the company and people who made OAK-1 and OAK-D, have a look at this great interview with Luxonis CEO, Brandon Gilles, on pyimagesearch.com.
OAK-1 board is smaller board of the two in series and it only has one camera.
Unlike OAK-D that has binocular camera with synchronized global shutter, OAK-1 is made specifically for applications where depth information is not necessary, but price and hardware footprint are important.
UPDATED 04/04/2022. I try my best to keep my articles updated on a regular basis and based on your feedback from YouTube/Hackster comments section. If you'd like to show your support and appreciation for these efforts, consider buying me a coffee (or a pizza) :) .
SpecsLet’s have a look at the specs from OAK official Kickstarter campaign page.
It is outfitted with 12 MP IMX378 image sensor with autofocus and F-number of 2.0. It has Myriad X VPU, the third-generation vision processing unit from Intel at its core. VPU is a specific type of microprocessor, optimized for accelerating computer vision tasks. Myriad X achieves that acceleration using two of it components:
Neural Compute Engine — a dedicated hardware accelerator for deep neural network inference, in other words ASIC, similar to Nvidia’s NVDLA or Edge TPU.
16 High Performance SHAVE Cores - programmable processors, with an instruction set tailored for computer vision, can be used to run traditional computer vision workloads, or can complement the Neural Compute Engine by running custom layer types for CNN applications.
Plus, the SoC also has vision accelerators, and a new native 4K image processor pipeline with support for up to 8 HD sensors connecting directly to the VPU.
From operations per second standpoint, Myriad X achieves combined compute capacity of 4 Trillion Operations per second, out of which 1 Trillion Operations per second is the theoretical maximum throughput of Neural Compute Engine. For comparison, NVIDIA Jetson Nano has compute capacity of 0.472 Trillion Floating point operation per second, Jetson Xavier NX 21 TOPS and Google Edge TPU 4 TOPS. So, similar to Google Edge TPU in terms of neural network inference compute, but more flexible thanks to presence of SHAVE cores and vision accelerators.
Quick look at the moduleNow for quick look at the module itself - it is very compact and has 65x36 mm.
The board you can see here is a pre-production sample, but it is identical to final production version with the exception of OpenCV logo in the front, which will be included in the final version, but absent from pre-production sample. We can see high-speed USB 3.1 socket here, IMX378 image sensor, a reset button. Behind the heatsink in the center of the board is Myriad X VPU and there are also additional connectors, such as I2C bus connectors and UART plus a power LED indicator.
The software installation process is very easy and user-friendly. The API for interaction with the module is open-source and can be compiled virtually for every platform. Pre-compiled packages are available for Windows/Linux/Mac OS and there is even pre-built Raspberrry Pi image that includes everything that you need to get started. I was running samples on Ubuntu 18.04, the installation instructions can be found here
Install the dependencies:
sudo apt install git python3-pip python3-opencv libcurl4 libatlas-base-dev libhdf5-dev libhdf5-serial-dev libatlas-base-dev libqtgui4 libqt4-test
Set
the udev rules:
echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
Clone the DepthAI GitHub repository and install python dependencies:
git clone https://github.com/luxonis/depthai.git
cd depthai
python3 -m pip install -r requirements.txt
That's
it - keep in mind that software for OAK boards is still in the development stage and there very likely be some changes before the official release. For example, currently the master branch only has pre-built binaries for Python 3.6, but there will be pre-built binaries available for other Python versions in the future.
There is an extensive collection of pre-trained models from OpenVINO Model Zoo, that can be used with OAK boards. Let’s have a look at few examples - run these from the cloned DepthAI repository folder:
python3 test.py -dd
This runs the default MobileNetv1 20 class SSD object detector trained on the PASCAL 2007 VOC.
python3 test.py -dd -cnn face-detection-retail-0004
For face detection in retail environment.
./depthai.py -cnn face-detection-retail-0004 -cnn2 emotions-recognition-retail-0003 -dd -sh 12 -cmx 12 -nce 2
For running two stage inference with face detection in first stage and emotion recognition in second stage(on the results of first stage detection).
All the above samples ran very smoothly with FPS count of 30 - the accuracy was impressive too, as you can see in the video. Emotion recognition model only can recognize very exaggerated emotions, but that's not a fault of the OAK board :)
Custom model conversion and inferenceModel Zoo pre-trained models are great for getting a feel of how good the performance is, but ultimately for most of people the readily available models are not enough. So, the important question is - how flexible and easy-to-use is OAK when it comes to performing inference with your own models?
For the flexibility, OAK boards support all operation and network types that Myriad X VPU supports, you can find the extensive list here. I run a series of tests with different network architectures, including MobileNet, Tiny YOLO, SqueezeNet and some others and could run all of them without any problems. While it is not a comprehensive assessment of compatibility, I would say that for example comparing to Edge TPU, OpenVINO model converter has less quirks and supports more operations. You can convert your model manually using guides on Luxonis website - the pipleine for model conversion from Tensorflow/Keras for example is
Frozen Graph -> Intermediate Representation model -> Binary blob
Or you can use aXeleRate, a Keras-based framework for AI on the Edge, which can be run on locally on Linux computer or in Google Colab.
aXeleRate can download and install OpenVINO converter, train the model and then automatically convert the model with best metrics into binary blob, that can be used for inference with OAK.
For K210 chip we made a raccoon detector as an example, this time let’s pick another animal. How about kangaroos?
Super helpful for kangaroo counting on these long road trips in Australia.
Follow step by step instruction in Google Colab notebook I prepared. Alternatively, if you'd like to train the model on a local computer(currently only tested with Ubuntu 18.04), create a new virtual environment with conda or virtualenv and install latest stable version of aXeleRate in that environment:
pip install git+https://github.com/AIWintermuteAI/aXeleRate
Then clone aXeleRate repository:
git clone https://github.com/AIWintermuteAI/aXeleRate
cd aXeleRate
and start the training from kangaroo_detection.json config file
python axelerate/traing.py --config kangaroo_detection.json
Both in Google Colab and on your local computer, aXeleRate will install the OpenVINO converter before starting the model training - that will take some time. Also remember to manually download the training dataset I shared on Google Drive if you're training on local computer - change the path in kangaroo_detection.json to the actual path where you extracted the dataset.
After training the model in Google Colab, copy the binary model blob and download the example script and.json model configuration from aXeleRate/example_scripts folder. Then run the example script on Linux computer with this command
python yolo.py --model YOLO_best_mAP.blob --config YOLO_best_mAP.json
and enjoy automatic kangaroo spotting.
The Kickstarter campaign for OAK boards have successfully finished, but if you want to buy one, public pre-orders will be launching for OAK-1 and OAK-D with price slightly higher than during Kickstarter campaign. OAK-D definitely deserves an article of its own and I will make it very soon. With OpenCV backing and software support and powerful hardware from Intel, OAK boards are up to a good start and hopefully will get widespread adoption in the field of computer vision hardware.
Add me on LinkedIn if you have any questions and subscribe to my YouTube channel to get notified about more interesting projects involving machine learning and robotics.
Comments
Please log in or sign up to comment.