Deep learning has dramatically improved computer vision performance and allowed it to reach human or in some cases even super human-level abilities. And the frameworks for training deep neural networks have been improving their user-friendliness over past few years to the point where average user with some Python programming skills can train and use neural networks for wide array of computer vision tasks, including image classification, object detection and semantic segmentation.
There is one caveat though - deep neural networks are known to require large quantities of data for training if you want to achieve good results. In some cases you can use open datasets - then problem solved.
In case with image classification, small to medium sized datasets can be obtained by scraping the web. For object detection, the situation is more difficult though, since training object detection networks requires not only the image, but the annotation files, that have bounding box coordinates in them. So, if the object in question doesn't have good open source detection dataset available your only option is to create dataset manually, which can be tedious task.
Unless...
You can automate dataset creation process by using synthetic data. There are a few ways to generate synthetic data for object detection:
1) Simply by pasting objects onto the background and randomizing their orientation/scale/position
2) Use realistic 3D rendering engine, such as Unreal Engine
3) Use GAN for data generation? Of course in that case you will already have a network capable of recognizing/detecting the object in question (the discriminator in GAN), so it's a bit of a chicken and egg problem
UPDATED 04/04/2022. I try my best to keep my articles updated on a regular basis and based on your feedback from YouTube/Hackster comments section. If you'd like to show your support and appreciation for these efforts, consider buying me a coffee (or a pizza) :) .
In this article we will focus on the simplest and easiest to dissect method - cut-paste. Don't be fooled by its seeming simplicity and unrealistic looks of images generated by the script. Convolutional neural networks don't have logic or common sense - so for our object detection network even seemingly absurd images will be a good learning material.
My task in question was Lego detection model for MARK robotic platform. I have found a good Lego classification dataset on Kaggle, but no luck with readily available detection datasets. So I decided to re-puprose the code used to generate synthetic scenes for the paper Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection.
Download and prepare the object instancesDownload Lego images dataset from here. It actually also was procedurally generated from 3D models using Autodesk Maya 2020.
For synthetic data generation we will need object instances and their binary masks - in our case, since Lego bricks are all on the black background we can simply use the following threshholding script to generate these masks. We also randomly color the Lego bricks, since we want model to detect different colors of Lego bricks.
# Standard imports
import cv2
import numpy as np;
import os
import time
import random
import sys
colors = ([1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 1, 0], [0, 1, 1], [1, 0, 1])
input_folder = sys.argv[1]
output_folder = sys.argv[2]
try:
os.mkdir(os.path.join(os.path.join(output_folder, "imgs")))
os.mkdir(os.path.join(os.path.join(output_folder, "masks")))
except Exception:
pass
for folder in os.listdir(input_folder):
for file in os.listdir(os.path.join(input_folder, folder)):
print(file)
img = cv2.imread(os.path.join(input_folder, folder, file))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 1,255, cv2.THRESH_BINARY_INV)
#coloring
RGB = random.randint(0, len(colors)-1)
img[thresh == 0] *= np.array(colors[RGB], dtype='uint8')
#writing to files
cv2.imwrite(os.path.join(os.path.join(output_folder, "imgs", file)), img)
cv2.imwrite(os.path.join(os.path.join(output_folder, "masks", file)), thresh)
cv2.imshow('final', img)
cv2.waitKey(50)
cv2.destroyAllWindows()
Run the above script with names of input folder(which contains folders with images of different objects, one for each type of object) and output folder, where images and masks will be saved, e.g.
python helper.py objects output
You will see images being processed and saved.
Next, clone my fork of Cut, Paste and Learn paper code - I changed it to work with Python 3 and accept.png images as masks.
git clone https://github.com/AIWintermuteAI/syndata-generation.git
Install all the necessary dependencies(I recommend you install the dependencies and execute the script in virtual environment for Python, such as conda or virtualenv).
pip install -r requirements.txt
Place the folder with images and masks into data_dir/objects_dir and add or change background pictures in data_dir/backgrounds. Distractors are other objects that are not objects, that we are trying to detect - I didn't use them while working on this project. Then run the generation script with the following command:
python dataset_generator.py data_dir/objects_dir/lego/imgs output_dir/ --num 3 --scale --dontocclude
And...
After we have the data we need to structure it properly. There needs to be 4 folders - training images, training annotations, validation images and validation annotations. Simply cut some of the pictures and annotations from the data you just generated and paste them into validation images and annotations folders. Make sure you cut and NOT copy the images/annotations.
Training the modelYou can use any framework/scripts to train the model - I however advise to use aXeleRate, a Keras-based framework for AI on the edge. It will automatically train the model and convert the best model for training session into format you require for inference on the edge - currently it supports trained model conversion to:.kmodel(K210),.tflite format(full integer and dynamic range quantization support available), OpenVINO IR model format. Experimental support: Google Edge TPU, TensorRT.
Install aXeleRate on your local machine with
pip install git+https://github.com/AIWintermuteAI/aXeleRate
To download examples run:
git clone https://github.com/AIWintermuteAI/aXeleRate.git
You can run quick tests with tests_training_and_inference.py in aXeleRate folder. It will run training and inference for each model type, save and convert trained models. Since it is only training for 5 epochs and dataset is very small, you will not be able to get useful models, but this script is only meant for checking for absence of errors.
For actual training you need to run the following command:
python axelerate/train.py -c config/lego_detector.json
You can download an example.json configuration file and pre-trained models from here. Make sure you change image/annotations training and validation folder paths to match their location on your system. For further explanation of additional parameters in configuration file, have a look at this article.
InferenceAfter training is finished you can do quick sanity check and perform inference on your computer with the following command:
python axelerate/infer.py -c config/lego_detector.json --weights path-to-h5-weights
The following steps will depend on which hardware you want to run trained model. For Raspberry Pi for example, use generated.tflite model and this example script.
in this article we will use K210 based robotic platform for AI education, MARK (stands for Make A Robot Kit). Copy.kmodel file from project folder to an SD card and insert SD card in cyberEye mainboard - cyberEye is a customized version of Maixduino. For starters let's do a quick test of our robot Lego detection capabilities by using Codecraft, a graphical programming environment from TinkerGen.
Open Codecraft, choose MARK(cyberEye) as device, add Custom models extension and define Object Detection model with following properties:
Then create following code with newly appeared blocks:
If you feel stuck, you can download the.cdc file for Codecraft from this article's attachments.
It works best with bigger Lego bricks, but also can detect smaller ones. If just grabbing some Lego bricks is not enough for you and you are up to a little challenge you can try tweaking the Micropython code I wrote for MARK. You can see the end result of the code execution in this video:
MARK detects and grabs Lego bricks and then scans the area for April Tag. After finding April Tag the robot approaches April Tag until set distance is reached. Then it drops the Lego block, turns around and continues from the beginning of the loop.
If you use the same model and same printed April Tag(A3 paper, tag36h11_1) you can simply execute the code in MaixPy IDE and watch your robot collect the Lego bricks!
Add me on LinkedIn if you have any questions and subscribe to my YouTube channel to get notified about more interesting projects involving machine learning and robotics.
Comments