In modern life, automobiles have brought convenience to human life, but also caused many problems, such as traffic violations, public safety, driving safety and other fields. Realizing the effective detection and monitoring of vehicles can reduce the work intensity of traffic law enforcement officers, strengthen the ability to pursue accountability for traffic accidents, and assist drivers to avoid vehicles.
Things used in this projectHardware components
- AMD-Xilinx Kria KV260 Vision AI Starter Kit
- AMD-Xilinx Kria KV260 Basic Accessory Pack
Software apps and online services
- AMD-Xilinx Vitis AI
Introduction: Object detection is a research field with a wide range of application scenarios. Powerful object detection capabilities can greatly improve the digitization, automation and security of people's lives. Object detection refers to segmenting and identifying complete images and extracting the required information from them. Among them, vehicle detection is an important part. In recent years, the application of neural networks has brought rapid development to target detection. Among them, the YOLO algorithm takes into account both accuracy and real-time performance, and has become an important solution for target detection. You only look once (YOLO) is a state-of-the-art, real-time object detection system. Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections. YOLO apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. YOLOv3 is extremely fast and accurate. Even though the YOLO algorithm has so many advantages, as a deep learning algorithm, its computational load is still very large. To be able to deploy object detection more widely, a low-power, easy-to-deploy, and cost-effective system is needed. Therefore, object detection systems should run on edge devices.
Aim: In this project, YOLO should be implemented on the FPGA edge device Kria KV 260 form AMD-Xilinx. As YOLO was designed for GPU, the required effort to run such a given network on an FPGA and the performance impact is of major interest.
Approach: The application requires a camera interface and a deep learning processing unit. To test these hardware parts, a given example project that uses these parts was run on the Kria board fist. Then, AMD-Xilinx's Vitis AI is used to compile the YOLO network for the Deep-Learning Processor Unit (DPU) on the FPGA with minor adjustments to the network. The included Vitis AI Runtime Engine with its Python API communicates with the DPU via an embedded Linux on the FPGAs microprocessor.
Conclusion: YOLOv3 has about 20 Gops to process on frame.Nevertheless, on the KV260 a throughput of 30 frames per second could be achieved.
There are already some implementations of YOLO on the edge device that are successful to do object detection. But there is a compromise between accuracy and real-time performance. By using the KV260's DPU for hardware acceleration of the neural network, a balance between the two can be achieved.
The Vitis AI framework from AMD-Xilinx has been extensively tested and shows its strengths, but also some teething problems. For running deep neural networks on FPGAs, Vitis AI is a framework with a good trade-off between development time and performance. It should be considered before implementing hardware-accelerated algorithms in HDL or HLS.
Prerequisite
- Linux host PC with Vitis AI installed
- Knowledge of the Vitis AI workflow
- Internet access for the KV260
Convolutional neural networks require corresponding parameters for outputting classification results.Download the yolov3 model configuration and weights from the darknet framework. https://pjreddie.com/darknet/yolo/
Use the DW2TF (https://github.com/jinyu121/DW2TF ) to convert the weights of yolov3 from the darknet framework to the TensorFlow format. After converting, you will get .pb and .ckpt files.
The usage of DW2TF is as follows:
First, make sure that TensorFlow is installed in your environment. If it is not installed, you can use the following command in the terminal.
pip3 install tensorflow
Clone the DW2TF project to your host, and create a folder in its directory to store the .weights file and .cfg file, here we name the folder "yolov3".
Then execute the following command in the main directory of the DW2TF project to convert.
python3 main.py --cfg 'yolov3/yolov3.cfg' --weights 'yolov3/yolov3.weights' --output 'yolov3/' --gpu 0
- 📷📷The schematic diagram of the successful conversion is as follows, and the files in the converted yolov3 folder are as shown in the figure.
Copy the newly generated file to the project directory.
You can directly use the above converted weights file for the following steps, or you can use it as a pre-trained model to train the model with your own dataset. Considering that the focus of this project is the deployment on KV260, the network training process will not be repeated. All in all, you need a trained model of yolov3.
Vitis AIThe Vitis™ AI development environment accelerates AI inference on Xilinx® hardware platforms, including both Edge devices and Alveo™ accelerator cards. It consists of optimized IP cores, tools, libraries, models, and example designs. It is designed with high efficiency and ease of use in mind to unleash the full potential of AI acceleration on Xilinx FPGAs and on adaptive compute acceleration platforms (ACAPs). It makes it easier for users without FPGA knowledge to develop deep-learning inference applications, by abstracting the intricacies of the underlying FPGA and ACAP.
In Vitis 1.4 AI Model Zoo, a variety of Neural Network models with three popular frameworks,Caffe, TensorFlow and PyTorch, are provided. For every model, a .yaml file that provides adescription of model name, framework, task type, network backbone, train & validation dataset,float OPS, prune or not, download link, license, and md5 checksum is released. You can browse amodel list in Vitis 1.4 AI Model Zoo and select a Neural Network model that you are interested inand get its basic information from a specified .yaml file. With the download link in the .yaml file,you can download the model freely.
Install Vitsi AI on the Linux host and run vitis ai docker in the /Vitis-AI directory.
docker runs successfully as shown below.
After getting the trained TensorFlow model i.e. .pb file and .ckpt file, you need to create a frozen graph to consolidate all TensorFlow files into a single file.
After running a container, activate the Conda environment "vitis-ai-tensorflow". Change the working directory to the directory where the converted files are stored.📷
Use freeze_graph.py provided by TensorFlow for processing transformations.
An example of command line usage is as follows:
freeze_graph --input_graph yolov3.pb --input_checkpoint yolov3.ckpt\
--input_binary true --output_graph frozen_graph.pb\
--output_node_names network/convolutional75/BiasAdd
Before running "freeze_graph" command, we need to know these parameters.
-input_graph: .pb file
-input_checkpoint:.ckpt file
-output_graph: Result file after running freez_graph
-output_node_name: The output of the model (from.pb file). You can find the output of the model by using https://netron.app/. Then upload the .pb file to see the structure of the model.
Then you will get frozen_graph.pb in the directory.
QuantizationThe process of inference is computation intensive and requires a high memory bandwidth to satisfy the low-latency and high-throughput requirement of Edge applications.
Quantization and channel pruning techniques are employed to address these issues while achieving high performance and high energy efficiency with little degradation in accuracy. Quantization makes it possible to use integer computing units and to represent weights and activations by lower bits, while pruning reduces the overall required operations.
Vitis AI provides a Docker container for quantization tools, including vai_q_tensorflow. After running a container, activate the Conda environment "vitis-ai-tensorflow".
Quantizing the model also requires a calibration set. The calibration set is usually a subset of the training/validation dataset or actual application images. Use the following code to get the python dictionary needed for the calibration set. This code comes from this website: https://www.hackster.io/hdcoe/running-yolov2-tiny-on-kv260-28f801#toc-3--quantization-2
import os
import cv2
import glob
import numpy as np
# set data path to your own dataset
dataset_path = "/workspace/data/VOCdevkit/VOC2007/JPEGImages"
# set input size
inputsize = {'h': 416, 'c': 3, 'w': 416}
# set input node name
input_node = "yolov2-tinynet1"
calib_batch_size = 10
def convertimage(img, w, h, c):
new_img = np.zeros((w, h, c))
for idx in range(c):
resize_img = img[:, :, idx]
resize_img = cv2.resize(resize_img, (w, h), cv2.INTER_AREA)
new_img[:, :, idx] = resize_img
return new_img
# This function reads all images in dataset and return all images with the name of inputnode
def calib_input(iter):
images = []
line = glob.glob(dataset_path + "/*.j*") # either .jpg or .jpeg
for index in range(0, calib_batch_size):
curline = line[iter * calib_batch_size + index]
calib_image_name = curline.strip()
image = cv2.imread(calib_image_name)
image = convertimage(image, inputsize["w"], inputsize["h"], inputsize["c"])
image = image / 255.0
images.append(image)
return {input_node: images} # first layer
Save this as calibration.py to the project directory. And you can change dataset_path, input_node and input size to match with your model.
Run the following command to quantify the model. It will take some time depending on the number of calib_iter and the model size.
vai_q_tensorflow quantize \
--input_frozen_graph frozen_graph.pb \
--input_nodes network/net1 \
--input_shapes ?,416,416,3 \
--output_nodes network/convolutional75/BiasAdd \
--input_fn calibration.calib_input \
--output_dir quantize/ \
--calib_iter 100
-input_frozen_graph: The output from freez_graph command (frozen_graph.pb).
-input_nodes: You can get this by using https://netron.app/ with frozen_graph.pb. In our work, the input node is " network/net1"
-output_nodes: You can get this by using https://netron.app/ with frozen_graph.pb. In our work, the output node is " network/convolutional75/BiasAdd "
-input_shapes: The size of input of the first node.
-input_fn: Dataset for the calibration process. This parameter requires a python dictionary that the key name is the input of the model and the value is image files.
-calib_iter: Number of the calibration round, this number is not larger than the number of images in the dataset.
-output_dir: Set the directory to save the output file of quantization. In this work, we set to quantization directory.
The terminal after executing the quantization command is shown in the figure. Then you will get quantize_eval_model.pb in the quantization directory.
This is the final process for deploying the model to run on DPU in KV260.
Before compiling, we need:
frozen_pb: The.pb file from the quantization process (quantize_eval_model.pb)
a: JSON file that represents the DPU architecture. You need to create arch.json file as follows:
{
"fingerprint":"0x1000020F6014406"
}
Then save as arch.json to the project directory.
Finally, the Compile command is:
vai_c_tensorflow --frozen_pb quantize/quantize_eval_model.pb -a arch.json -o yolov3 -n yolov3
o: The directory for output after compiling.
n: Name of the model.
Then you can see yolov3.xmodel, md5sum.txt, meta.json in the yolov3 directory.
Unfortunately, this project has an error when compiling yolov3, the reason has not yet been identified. Although tiny-yolov2 was compiled successfully, the detection effect was not good, and finally chose to use the network model officially provided by Xilinx.
Deploying YOLO network based on smartcam appThis section will introduce how to deploy the model compiled above on the KV260 through the smartcam application. The smart camera application runs with Vitis Video Analytic SDK (VVAS) for DPU controlling. So we need to set two parts: DPU plugin configuration and VVAS configuration. After getting the correct configuration file, deploy it on the KV260 board.
Configuration files
DPU configuration:
The configuration of the DPU requires a .prototxt file, the specific format is shown as below.
model {
name: "yolo-v3"
kernel {
name: "yolo-v3"
mean: 0.0
mean: 0.0
mean: 0.0
scale: 0.00390625
scale: 0.00390625
scale: 0.00390625
}
model_type : YOLOv3
yolo_v3_param {
num_classes: 1
anchorCnt: 5
conf_threshold: 0.5
nms_threshold: 0.7
biases: 10
biases: 13
biases: 16
biases: 30
biases: 33
biases: 23
biases: 30
biases: 61
biases: 62
biases: 45
biases: 59
biases: 119
biases: 116
biases: 90
biases: 156
biases: 198
biases: 373
biases: 326
test_mAP: false
}
is_tf : true
}
You can change the model name and bias values. You can find the bias values from the anchor parameter in the darknet configuration: yolov3.cfg.
VVAS configuration:
VVAS plugin requires four configuration files, namely preprocess.json, aiinference.json, label.json and drawresult.json.
preprocess.json
This file is used before the inference process.
{
"xclbin-location":"/lib/firmware/xilinx/kv260-smartcam/kv260-smartcam.xclbin",
"ivas-library-repo": "/opt/xilinx/lib",
"kernels": [
{
"kernel-name": "pp_pipeline_accel:pp_pipeline_accel_1",
"library-name": "libivas_xpp.so",
"config": {
"debug_level" : 1,
"mean_r": 0,
"mean_g": 0,
"mean_b": 0,
"scale_r": 0.25,
"scale_g": 0.25,
"scale_b": 0.25
}
}
]
}
aiinference.json
You need to change parameters as follows:
"model-name": The name of your model.
"model-class": VVAS provides different classes including YOLOV3, FACEDETECT, CLASSIFICATION, SSD, REFINEDET, TFSSD, YOLOV2
"model-path": This is the path of your project in the KV260. I will save my model (.xmodel) in /home/petalinux on KV260.
{
"xclbin-location":"/lib/firmware/xilinx/kv260-smartcam/kv260-smartcam.xclbin",
"ivas-library-repo": "/usr/lib/",
"element-mode":"inplace",
"kernels" :[
{
"library-name":"libivas_xdpuinfer.so",
"config": {
"model-name" : "yolo-v3",
"model-class" : "YOLOV3",
"model-path" : "/home/petalinux",
"run_time_model" : false,
"need_preprocess" : false,
"performance_test" : false,
"debug_level" : 1
}
}
]
}
label.json
{
"model-name": "yolo-v3",
"num-labels": 1,
"labels" :[
{
"label": 2,
"name":"car",
"display_name":"car"
},
{
"name": "person",
"label": 0,
"display_name" : "person"
}
]
}
In this file, I show only 2 classes to give you an idea how is the structure of the label parameter. You can continue writing it until 20 classes since yolov3 have 20 classes detection.
Drawresult.json
{
"xclbin-location":"/usr/lib/dpu.xclbin",
"ivas-library-repo": "/opt/xilinx/lib",
"element-mode":"inplace",
"kernels" :[
{
"library-name":"libivas_airender.so",
"config": {
"fps_interval" : 10,
"font_size" : 2,
"font" : 1,
"thickness" : 2,
"debug_level" : 0,
"label_color" : { "blue" : 0, "green" : 0, "red" : 255 },
"label_filter" : [ "class", "probability" ],
"classes" : [
{
"name" : "person",
"blue" : 255,
"green" : 0,
"red" : 0
}
]
}
}
]
}
You can continue writing to 20 classes. You can change "font_size", "font", "thickness" and colors. Make sure all class names are the same as label names.
Finally, you will have the following files in the yolov3 directory. Make sure .xmodel and .prototxt have the same name as the directory name.
deploying on the KV260 board.
Before deploying in this section, please make sure that you have burned the KV260 image on the SD card. Before deploying in this section, please make sure that you have burned the KV260 image on the SD card. If you haven't, please refer to this website.https://www.xilinx.com/products/som/kria/kv260-vision-starter-kit/kv260-getting-started-ubuntu/setting-up-the-sd-card-image.html
According to the official Xilinx tutorial, connect the SD card, USB cable, IAS camera module, monitor HDMI cable and power supply to the KV260 board in turn, and it can start normally. This project uses MobaXterm software to operate based on 📷KV260 board .
📷Before connecting the power, open the MobaXterm software, create a new session, and select Serial. Serial port Select the COM port corresponding to KV260.
📷 Open a new session and turn on the power.
📷The interface will display the LAN IP address corresponding to the board. Create a new session based on the IP address, select SSH, and fill in the IP address for remote host, here is 192.168.0.104. After the creation is complete, open it and enter the password to enter the system of the KV260 board.
Get the latest application package.(refers to https://xilinx.github.io/kria-apps-docs/main/build/html/docs/smartcamera/docs/app_deployment.html )
Check the package feed for new updates.
sudo dnf update
Confirm with “Y” when prompted to install new or updated packages.
Sometimes it is needed to clean the local dnf cache first. To do so, run:
sudo dnf clean all
Get the list of available packages in the feed.
sudo xmutil getpkgs
Install the package with dnf install:
sudo dnf install packagegroup-kv260-smartcam.noarch
System will ask “Is this ok [y/N]:” to download the packages, please type “y” to proceed.
Dynamically load the application package.
The firmware consist of bitstream, device tree overlay (dtbo) and xclbin file. The firmware is loaded dynamically on user request once Linux is fully booted. The xmutil utility can be used for that purpose.
Show the list and status of available acceleration platforms and AI Applications:
sudo xmutil listapps
Switch to a different platform for different AI Application:
- When xmutil listapps shows that there’s no active accelerator, just activate kv260-smartcam.
sudo xmutil loadapp kv260-smartcam
- When there’s already another accelerator being activated apart from kv260-smartcam, unload it first, then switch to kv260-smartcam.
sudo xmutil unloadapp
sudo xmutil loadapp kv260-smartcam
Deploy detection network based on smartcam
You need to pay attention to two directories:
- /opt/xilinx/share/vitis_ai_library/models/kv260-smartcam/
The label.json, .prototxt and .xmodel files are placed in this directory.
- /opt/xilinx/share/ivas/smartcam/
The aiinference.json, drawresult.json and preprocess.json files are placed in this directory.
After the file is placed, use the following command to run the application.
sudo smartcam -f /home/petalinux/data/video1.nv12.h264 -i h264 -W 1920 -H 1080 -t dp -r 30 -a yolov3
The input can use a file or a mipi camera. If you use a mipi camera, change -f to -mipi.
The output can use display, RTSP or file. The parameters corresponding to -t are dp, rtsp, file.
transcode video to H264 file.
In order for the board to correctly identify the input video, we need to transcode the .mp4 format video to .h264 format. You need to download the ffmpeg module http://ffmpeg.org/download.html and install these libs:
sudo apt-get install yasm
sudo apt-get install nasm
sudo apt-get install libx264-dev
sudo apt-get install libx265-dev
sudo apt-get install libfdk-aac-dev
sudo apt-get install libmp3lame-dev
sudo apt-get install libopus-dev
Unzip the downloaded ffmpeg, enter the unzipped folder, and execute the following commands in sequence.
./configure
make
sudo make install
Enter the following command to see if the installation is successful.
ffmpeg -version
After the installation is successful, use the following command to transcode the video.
ffmpeg -i input-video.mp4 -c:v libx264 -pix_fmt nv12 -vf scale=1920:1080 \
-r 30 output.nv12.h264
You need to modify the input and output video names to suit the video you want to use for detection.
Here is the video demos: https://www.bilibili.com/video/BV1PP4y1K7FG?p=1
Detection effect
This project uses a number of different yolov3 network parameters to deploy on the KV260 board, and detects videos in different scenarios. This section presents and analyzes these results. Screenshots of the test results are shown below.
Figure Vehicle Detection in Overpass Scenarios
Among them, (a) is the original image, (b) is the detection result of the yolov3_bdd network, (c) is the detection result of the yolov3_cityscapes network, and (d) is the detection result of the yolov3_voc network. It can be seen that the detection effect of the yolov3_bdd network is excellent, most of the cars are detected, and the frame rate is about 8 frames. The yolo_voc network detects the larger car in the image, and the smaller car in the distance is almost undetectable, and the frame rate is about 8 frames. Yolov3_cityscapes network recognition effect is poor, but the frame rate can reach 40 frames.
The reasons for the above results are that the yolov3_cityscapes network is only about 3MB, while the size of the yolov3_bdd and yolov3_voc networks is about 65MB. It is usually difficult for smaller networks to obtain higher-precision detection results, but the detection speed is fast, and the detection accuracy of larger networks is often higher. High, but the real-time performance will be compromised.
Figure Vehicle detection in Parking lot Scenarios
Among them, (a) is the original image, (b) is the detection result of the yolov3_bdd network, (c) is the detection result of the yolov3_cityscapes network, and (d) is the detection result of the yolov3_voc network. It can be seen that the detection effects of the three networks are all good, and most of the cars are detected. For the same target to be detected, the confidence of the yolov3_cityscapes network is significantly lower than that of the yolov3_voc and yolov3_bdd networks, which is consistent with the effect of the model size on the accuracy described above. The detection effect of Yolov3_bdd network is poorer than that of yolov3_voc, which is contrary to the results in scene one.
The detection results in two different scenarios are different. In order to find out the reason, we chose the third scenario for comparison, and the results are as follows.
Figure Vehicle Detection in Highway Scenarios
Analysis of the detection results in the road scene, combined with the data sets used for training of different networks, it can be seen that the yolov3_bdd network is more suitable for vehicle detection in top-down view images shot from a high place, while the yolov3_voc and yolov3_cityscapes networks are more suitable for vehicle detection from a head-up perspective.
Conclusion
To sum up, when deploying a vehicle detection network on KV260, differences in application scenarios, network sizes, and training datasets will significantly affect the final detection results. If the KV260 is deployed in a monitoring location from a top-down perspective, a network such as yolov3_bdd is suitable; if it is deployed in a normal perspective, such as a driving recorder, the yolov3_cityscapes and yolov3_voc networks are more suitable; if it is deployed in a high-reality requirement In the scenario of yolo_cityscapes, a lightweight network such as yolo_cityscapes should be selected. For a specific deployment scenario, the best choice is to select and establish a dataset according to the scenario, and train the most suitable yolo network to obtain a good detection effect.
ProspectThis project can carry out real-time monitoring of vehicles, but the accuracy is slightly lower, or use a network with large parameters for high-precision detection, but the real-time performance is difficult to guarantee.
Next, this project will be improved in the following aspects:
1. Improve the YOLO algorithm, balance real-time and accuracy, and achieve high-precision real-time detection in fixed application scenarios.
2. Preprocess the video to ensure the robustness of the system under different lighting and weather.
3. Add other functions besides detection to meet more complex needs, such as vehicle counting, red light running detection, parking lot vacant parking space counting and parking path planning.
Comments