This post is meant to document what has been done using the OAK-D, which is acronym of OpenCV AI Kit with Depth developed Luxonis, hoping to help myself to recap in future and to share with community how easy it is to run custom dataset in OAK-D. It's a an advanced computer vision system which can run neural network & stereo depth sensing. This project is initiated together with another community member, aiming to create a system to detect correct parts & sequence for assembly process. With that in mind, what appears to be easy & convenient will be my kid's stack ring toy, that the bigger ring stack first and so on. Since each ring has different color, the plan is to create custom dataset based on the ring color.
Dataset must be collected, labelled and annotated before training. In this project, Roboflow is used for annotation since they provide free community access with reasonably sufficient augmentation feature plus user-friendly interface. I really need to give a shout-out to Roboflow because they are really into machine learning & computer vision & they produce really good blog, which I believe many are benefit from, at least I am. Kudos to Roboflow!
Pay attention to the resize is set to 300x300 because that will be the input size to the model and the annotated dataset are exported in TFRecord format because it will be use for training in TensorFlow later on. As per advice by their blog, don't reveal the unique download code of your dataset.
Next, the custom dataset is ready for training, again thanks to Roboflow & Luxonis, somebody had put everything nicely in a single page Colab notebook. This work is heavily inspired by this post https://blog.roboflow.com/luxonis-oak-d-custom-model & I have shared my own version below for others.
A little walk-through the notebook, after connect to the GPU of Colab, TensorFlow environment and dependency needs to be installed (sorry if the link is not updated or deprecated because this work was done somewhere July 2021) as well as downloading the pretrained SSD-MobileNet-V2 model weight. As mentioned above, you should use your download code in this line (in the notebook), which will download the annotated custom dataset from your Roboflow account.
!curl -L "<<Your download code from roboflow>>" > roboflow.zip; unzip roboflow.zip; rm roboflow.zip
Training pipeline can be configured in lines below. There are plenty of info online about machine learning parameters and I'm no expert to advise on that part so I will leave it to expert.
# Number of training steps - small size train quickly, more steps increase accuracy.
num_steps = 20000
# Number of evaluation steps.
num_eval_steps = 50
# Intersection over union.
iou_threshold = 0.50
# Learning rate
initial_learning_rate = 0.015
Training result can be reviewed by using TensorBoard. And also model can be validated running inference on test images.
To deploy this custom dataset model in OAK-D, we need to convert the *pb file (protobuf file which contains the bias & weight of the model) to *blob file. OpenVINO is playing important role at this step (let's not forget a big shout-out to OpenVINO as well). Install OpenVINO toolkit to convert TensorFlow model to intermediate representation which will be compiled to *blob file. Download the *blob file and will need it for OAK-D.
3.0 DEPLOYThis work is developed based on this script, https://github.com/luxonis/depthai-python/blob/main/examples/SpatialDetection/spatial_mobilenet.py. Of course don't forget to put the downloaded *blob file to the same directory. (If you are new to OAK-D, you may refer to https://docs.luxonis.com/en/latest/pages/tutorials/first_steps). Like mentioned above, the credit is not mine and I'm happy to share my own version of python script below. First thing first, we need to setup the pipeline by configuring:
- Model path
- Label mapping
- OpenVINO version
- RGB cameras
- Left & right camera as stereo depth
# Blob path
nnBlobPath = str((Path(__file__).parent / Path('mobilenet-ssd-openvino_2012.3_6shave_StackRingR4.blob')).resolve().absolute())
# MobilenetSSD label texts
labelMap = ["unknown", "Blue", "Body", "Green", "Orange", "Red", "Yellow"]
print("Create pipeline...")
pipeline = dai.Pipeline()
pipeline.setOpenVINOVersion(dai.OpenVINO.Version.VERSION_2021_3)
# Setup color camera
cam = pipeline.createColorCamera()
cam.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
cam.setIspScale(2, 3) # to match the monocamera 720p
cam.setBoardSocket(dai.CameraBoardSocket.RGB)
cam.initialControl.setManualFocus(130)
cam.initialControl.setManualExposure(20000, 600) # Exposure time in us max 33k, sensitivity ISO max 1.6k
# color camera setting for spatial neural network
cam.setPreviewSize(300, 300)
cam.setInterleaved(False)
cam.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
# Setup mono camera
left = pipeline.createMonoCamera()
left.setResolution(dai.MonoCameraProperties.SensorResolution.THE_720_P)
left.setBoardSocket(dai.CameraBoardSocket.LEFT)
right = pipeline.createMonoCamera()
right.setResolution(dai.MonoCameraProperties.SensorResolution.THE_720_P)
right.setBoardSocket(dai.CameraBoardSocket.RIGHT)
# Setup stereo
stereo = pipeline.createStereoDepth()
stereo.initialConfig.setConfidenceThreshold(245)
stereo.initialConfig.setMedianFilter(dai.StereoDepthProperties.MedianFilter.KERNEL_3x3)
#stereo.setLeftRightCheck(True)
stereo.setSubpixel(False)
stereo.setExtendedDisparity(False)
#stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
left.out.link(stereo.left)
right.out.link(stereo.right)
print("Create spatial neural network ")
sdn = pipeline.createMobileNetSpatialDetectionNetwork()
sdn.setBlobPath(nnBlobPath)
sdn.setConfidenceThreshold(0.5)
sdn.input.setBlocking(False)
sdn.setBoundingBoxScaleFactor(0.2)
sdn.setDepthLowerThreshold(100)
sdn.setDepthUpperThreshold(5000)
cam.preview.link(sdn.input)
stereo.depth.link(sdn.inputDepth)
# Setup stream
cam_xout = pipeline.createXLinkOut()
cam_xout.setStreamName("cam")
cam.isp.link(cam_xout.input)
depth_xout = pipeline.createXLinkOut()
depth_xout.setStreamName("dep")
sdn.passthroughDepth.link(depth_xout.input)
sdn_xout = pipeline.createXLinkOut()
sdn_xout.setStreamName("det")
sdn.out.link(sdn_xout.input)
bbox_xout = pipeline.createXLinkOut()
bbox_xout.setStreamName("bbox")
sdn.boundingBoxMapping.link(bbox_xout.input)
print("Pipeline created.")
Pay attention to this line, it's responsible to run the inference and output the result in later part.
sdn = pipeline.createMobileNetSpatialDetectionNetwork()
Once pipeline created, we need to create streaming nodes within the device. Line below will return the label, confidence & bounding box info such as ROI position.
...
detections = inDet.detections
if len(detections) != 0:
bboxMap = bboxQ.get()
bboxes = bboxMap.getConfigData()
...
These lines below will display the label, confidence and frame of the detected object in the streaming window.
cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
cv2.putText(frame, "{:.0f}%".format(confidence), (x1 - 50, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), cv2.FONT_HERSHEY_SIMPLEX)
Up to this point, the custom model is running successfully and manage to detect the object that I have trained. If you interest is to train your own dataset and run object detection with it, you may skip the following part.
The next part is sequence check for the stack rings. It is basically extracting the location of each detected object and comparing it to a known sequence and highlight if correct or wrong, plus some visualization in the GUI.
Note: Although this project has the stereo depth node running, I didn't use its depth value because:
- I don't need it to tell the correct sequence of stack ring.
- I haven't figured out how to get good info from the disparity map as there are quite a significant area with invalidated disparity value.
Enjoy a short demo below.
Comments