Motorcycles and bicycles are the main vehicles of transportation in many countries. The number of these vehicles on the road is increasing every year as well as the number of accidents. In some countries, more than 50% of all road accidents are caused by motorcycles. This shows that riding a motorcycle or bicycle requires good safety.
One of the most important protections is wearing a helmet. When the accident occurs, the head might fall hard to the ground. Without head protection, the accident can be fatal. Many countries have rules about wearing a helmet while riding. Police are employed to arrest motorcyclists for not wearing helmets. But this is not an effective way to use humans to do this. For this reason, we want to implement an intelligent system to detect whether a cyclist or rider is wearing a helmet.
OverviewWe present a helmet detection system using YOLOv3-Tiny on the Xilinx Kria KV260 FPGA board. The system can detect helmets and heads (not helmets) from images or videos in real-time. An overview of our work is shown in the following figure.
In this work, the input will be images or videos. YOLOv3-Tiny then generates an output that contains the class prediction and the position of the object in the input images or videos.
To implement it on the KV260 device, we adopt the framework of the smart camera application provided by Xilinx ACM (https://xilinx.github.io/kria-apps-docs/main/build/html/docs/smartcamera/smartcamera_landing.html)
The webcam is connected to the KV260 board with a USB port. The output is sent to the computer via Ethernet port using the RTSP protocol.
Fortunately, we don't have to implement our own integrated design since Xilinx provides it in the smart camera application. Here is what is inside the application.
The smart camera application is implemented on the Vitis Video Analytics framework SDK (VVAS). The preprocessor is used to resize input images and colour space transformations before AI inference. The DPU is used to perform the inference. The inference results are converted to class predictions and object bounding boxes. This output will then be streamed to the computer and we display it on the screen.
Dataset PreparationThis dataset contains several types of helmets, including cycling helmets, motorcycle helmets and safety helmets. There are 5, 764 images.
We want to detect only 2 classes: Helmets and Heads (Not helmets).
We collect datasets from two sources:
1. https://www.kaggle.com/andrewmvd/helmet-detection
2. https://www.kaggle.com/andrewmvd/hard-hat-detection
Both datasets have the same label design, where 0 is a helmet and 1 is not a helmet. Then, we create a new director and copy images and labels from two datasets into this new folder.
Training ResultsWe train the model on Google Colab using the darknet framework (https://github.com/AlexeyAB/darknet/) The number of epochs is set to 700, 000 epochs with a batch size of 8. The input image size is 416 pixels.
Here is the model performance.
Precision: TP/(TP + FP) = 0.63
Recall: TP/(TP + FN) = 0.74
Average IoU = 48.29%
mean Average Precision (mAP) = 43.12%
This is the performance after we trained the model for 10 days ( 6-7 hours each day). The precision result is not very high but it is OK. The recall result is above 0.7 which is good that the model can mostly detect the object.
Mapping YOLOv3-Tiny to KV260To run YOLOv3-Tiny on the KV260, we need to convert the darknet format (.cfg,.weights) to a Tensorflow compatible file (.pb).
1. Convert Darknet to Tensorflow
We use https://github.com/jinyu121/DW2TF to convert the trained model. After the conversion process, we get .pb and .ckpt files.
2. Build a frozen graph
We need to combine.pb and.ckpt into a single file by running the following steps:
- Create a directory and name it freez_graph
- Set the parameters for the freez_graph command as follows:
freeze_graph --input_graph yolov3-tiny_helmet.pb --input_checkpoint yolov3-tiny_helmet.ckpt --output_graph frozen_graph.pb --output_node_name yolov3-tinyconvolutional13/BiasAdd --input_binary true
The output file will be frozen_graph.pb
3. Quantization
To quantize the model, we run this command:
vai_q_tensorflow quantize --input_frozen_graph frozen/frozen_graph.pb --input_fn calibration.calib_input --output_dir quantization/ --input_nodes yolov3-tinynet1 --output_nodes yolov3-tinyconvolutional13/BiasAdd --input_shapes ?,416,416,3 --calib_iter 100
where
Input_fn: Dataset for the calibration process. This parameter requires a python dictionary that the key name is the input of the model and the value is image files.
We create calibration.py as follows:
import os
import cv2
import glob
import numpy as np
dataset_path = "Director of the dataset images"
calib_batch_size = 10
inputsize = {'h': 416, 'c': 3, 'w': 416}
def convertimage(img, w, h, c):
new_img = np.zeros((w, h, c))
for idx in range(c):
resize_img = img[:, :, idx]
resize_img = cv2.resize(resize_img, (w, h), cv2.INTER_AREA)
new_img[:, :, idx] = resize_img
return new_img
# This function reads all images in dataset and return all images with the name of inputnode
def calib_input(iter):
images = []
#line = open(calibfile).readlines()
line = glob.glob(dataset_path + "/*.png")
for index in range(0, calib_batch_size):
curline = line[iter * calib_batch_size + index]
calib_image_name = curline.strip()
image = cv2.imread(calib_image_name)
image = convertimage(image, inputsize["w"], inputsize["h"], inputsize["c"])
image = image / 255.0
images.append(image)
return {"yolov3-tinynet1": images} # first layer
4. Compile model
To compile the model, we run this command:
vai_c_tensorflow --frozen_pb quantize/quantize_eval_model.pb -a arch.json -o yolov3tinyHelmet -n yolov3tinyHelmet
where
frozen_pb: The.pb file from the quantization process (quantize_eval_model.pb)a: JSON file that represents the DPU architecture.
{
"fingerprint":"0x1000020F6014406"
}
Finally, the output of the compile process is yolov3tinyHelmet.xmodel
Customizing Smart Camera ApplicationThe smart camera application runs based on Vitis Video Analytics SDK (VVAS) framework. We need to set VVAS plugin configuration and DPU configuration.
In the end, we will have configuration files including yolov3tinyHelmet.protxt,preprocess.json,aiinference.json,label.json and drawresult.json
1. DPU configuration
We create yolov3tinyHelmet.prototxt for DPU configuration as follows
model {
name: "yolov3tinyHelmet"
kernel {
name: "yolov3tinyHelmet"
mean: 0
mean: 0
mean: 0
scale: 0.25
scale: 0.25
scale: 0.25
}
model_type : YOLOv3
yolo_v3_param {
num_classes: 2
anchorCnt: 3
conf_threshold: 0.3
nms_threshold: 0.45
layer_name:"yolov3-tinyconvolutional13/BiasAdd/aquant"
layer_name:"yolov3-tinyconvolutional10/BiasAdd/aquant"
biases: 10
biases: 14
biases: 23
biases: 27
biases: 37
biases: 58
biases: 81
biases: 82
biases: 135
biases: 169
biases: 344
biases: 319
test_mAP: false
}
is_tf : true
}
2. Preprocess configuration
We have to create preprocess.json for the configuration before the inference.
{
"xclbin-location":"/lib/firmware/xilinx/kv260-smartcam/kv260-smartcam.xclbin",
"ivas-library-repo": "/opt/xilinx/lib",
"kernels": [
{
"kernel-name": "pp_pipeline_accel:pp_pipeline_accel_1",
"library-name": "libivas_xpp.so",
"config": {
"debug_level" : 1,
"mean_r": 0,
"mean_g": 0,
"mean_b": 0,
"scale_r": 0.25,
"scale_g": 0.25,
"scale_b": 0.25
}
}
]
}
3. AI inference configuration
We configure model-name and model-class in aiinference.json to:
model-name : “yolov3tinyHelmet”
model-class : “YOLOV3”
{
"xclbin-location":"/lib/firmware/xilinx/kv260-smartcam/kv260-smartcam.xclbin",
"ivas-library-repo": "/usr/lib/",
"element-mode":"inplace",
"kernels" :[
{
"library-name":"libivas_xdpuinfer.so",
"config": {
"model-name" : "yolov3tinyHelmet",
"model-class" : "YOLOV3",
"model-path" : "/home/petalinux",
"run_time_model" : false,
"need_preprocess" : false,
"performance_test" : false,
"debug_level" : 0
}
}
]
}
4. Label configuration
We create label.json to present the name of the class as follow:
{
"model-name": "yolov3tinyHelmet",
"num-labels": 2,
"labels" :[
{
"name": "helmet",
"label": 0,
"display_name" : "helmet"
},
{
"name": "head",
"label": 1,
"display_name" : "head"
}
]
}
5. Drawresult configuration
This drawresult.json will be used in KV260 to present the output results. You can change font size, font thickness and the colour of the label and object boxes.
{
"xclbin-location":"/usr/lib/dpu.xclbin",
"ivas-library-repo": "/opt/xilinx/lib",
"element-mode":"inplace",
"kernels" :[
{
"library-name":"libivas_airender.so",
"config": {
"fps_interval" : 10,
"font_size" : 2,
"font" : 1,
"thickness" : 2,
"debug_level" : 0,
"label_color" : { "blue" : 0, "green" : 0, "red" : 0 },
"label_filter" : [ "class", "probability" ],
"classes" : [
{
"name" : "helmet",
"blue" : 255,
"green" : 0,
"red" : 0
},
{
"name" : "head",
"blue" : 0,
"green" : 255,
"red" : 0
}]
}
}
]
}
Finally, here are all configurations for mapping the YOLOv3-Tiny to the KV260 board.
1. We upload the yolov3tinyHelmet directory to the board with this command:
scp -r yolov3tinyHelmet petalinux@192.168.0.150:~/
where our board IP is 192.168.0.150
In KV260, this director is in /home/petalinux.
2. To add the yolov3tinyHelmet in the application, we replace aiinference.json, preprocess.json and drawresult.json in the SSD model with our configuration files.
sudo cp yolov3tinyHelmet/aiinference.json /opt/xilinx/share/ivas/smartcam/ssd/aiinference.json
sudo cp yolov3tinyHelmet/preprocess.json /opt/xilinx/share/ivas/smartcam/ssd/preproces.json
sudo cp yolov3tinyHelmet/drawresult.json /opt/xilinx/share/ivas/smartcam/ssd/drawresult.json
Now, this model replaces the SSD model of the smart camera application. To perform the application, we run:
sudo xmutil unloadapp
sudo xmutil loadapp kv260-smartcam
sudo smartcam --usb 0 -W 1920 -H 1080 --target rtsp --aitask ssd
ResultsThis is the output video that the application saves to a file.
This is the result when inputting video from the webcam. We point the webcam at the test video displayed on the monitor. The output video from the application is streamed back to the computer and displayed beside the input video.
ConclusionAs you can see, our detection system using YOLOv3-tiny works very well on KV260. It can detect helmets in real-time. But the model still needs to be improved as some objects were missed and sometimes misclassified.
References
Comments