We conducted object detection using the KR260's DPU, and used the lightweight model "YOLOX-nano" with PyTorch.
We also compared the execution speed of YOLOX and YOLOv3.
This project is part of a subproject for the AMD Pervasive AI Developer Contest.
Be sure to check out the other projects as well.
***The main project is currently under submission. ***
0. Main project << under submission
2. PYNQ + PWM(DC-Motor Control)
3. Object Detection(Yolo) with DPU-PYNQ
4. Implementation DPU, GPIO, and PWM
6. GStreamer + OpenCV with 360°Camera
7. 360 Live Streaming + Object Detect(DPU)
8. ROS2 3D Marker from 360 Live Streaming
9. Control 360° Object Detection Robot Car
10. Imporve Object Detection Speed with YOLOX << this project
11. Benchmark Architectures of the DPU
12. Power Consumption of 360° Object Detection Robot Car
13. Application to Vitis AI ONNX Runtime Engine (VOE)
14. Appendix: Object Detection Using YOLOX with a Webcam
Please note that before running the above subprojects, the following setup, which is the reference for this AMDcontest, is required.
https://github.com/amd/Kria-RoboticsAI
IntroductionWe conducted object detection using the KR260's DPU with the lightweight model "YOLOX-nano" and PyTorch.
We created a program (.ipynb and.py) that runs on PYNQ and confirmed its operation, and compared the detection speed with the old YOLOv3 program.
Below is the test video with the execution and speed comparison. We confirmed that the new YOLOX program is approximately five times faster.
YOLOX
The pre-trained, pre-compiled model is provided as a sample by Xilinx (AMD).
The sample used is available here:
https://github.com/Xilinx/Vitis-AI/tree/master/model_zoo/model-list/pt_yolox-nano_3.5
YOLOv3
There is a YOLOv3-tiny sample program as part of the DPU-PYNQ samples. The method to execute it in the KR260+DPU environment is introduced in the following article:
3. Object Detection(Yolo) with DPU-PYNQ
However, YOLOv3 is quite an old version and was slow in detection speed during actual use.
Therefore, We used the relatively newer and lightweight model, YOLOX-nano, and conducted a benchmark comparison for speed.
Creating the YOLOX-nano Model with Vitis AIFirst, we created (compiled) the YOLOX model for KR260 in a Linux environment.
We downloaded and extracted the sample model of YOLOX.
wget https://www.xilinx.com/bin/public/openDownload?filename=pt_yolox-nano_3.5.zip
unzip openDownload?filename=pt_yolox-nano_3.5.zip
Compilation with Vitis AI
We used Vitis AI for compilation, launching the CPU version of Vitis AI for PyTorch.
cd Vitis-AI/
./docker_run.sh xilinx/vitis-ai-pytorch-cpu:latest
Using the arch.json created earlier as an argument, We compiled the model.
The.xmodel file is created in the folder after compilation.
cd pt_yolox-nano_3.5/
conda activate vitis-ai-pytorch
echo '{' > arch.json
echo ' "fingerprint": "0x101000016010407"' >> arch.json
echo '}' >> arch.json
vai_c_xir -x quantized/YOLOX_0_int.xmodel -a arch.json -n yolox_nano_pt -o ./yolox_nano_pt
This time, it's an example of the fingerprint of B4096 on KR260.
If you want to try different architectures like B512 or B1024 and need to check the file where the fingerprint is written (arch.json), it is located in the following folder when you synthesize the DPU with Vitis:
~/***_hw_link/Hardware/dpu.build/link/vivado/vpl/prj/prj.gen/sources_1/bd/design_1/ip/design_1_DPUCZDX8G_1_0/arch.json
We created a program that runs on PYNQ, using.ipynb and.py format.
Since the algorithm differs from the YOLOv3 sample program, some modifications were necessary. The actual program can be found on the following GitHub repository:
The process involves typical YOLOX-nano operations, including preprocessing, DPU input/output, post-processing, and writing BBOX. DPU inference replaces the usual CPU or GPU inference, also measuring execution time.
Key Part of the Programdef run(image_index, display=False):
# Pre-processing
input_shape = (416, 416)
input_image = cv2.imread(os.path.join(image_folder, original_images[image_index]))
image_data, ratio = preprocess(input_image, input_shape)
# DPU inference
image[0,...] = image_data.reshape(shapeIn[1:])
job_id = dpu.execute_async(input_data, output_data)
dpu.wait(job_id)
# Post-processing
outputs = np.concatenate([output.reshape(1, -1, output.shape[-1]) for output in output_data], axis=1)
bboxes, scores, class_ids = postprocess(outputs, input_shape, ratio, nms_th, nms_score_th, image_width, image_height)
if display:
display = draw_bbox(input_image, np.array(bboxes_with_scores_and_classes), class_names)
cv2.imwrite(os.path.join("img/", f'result.jpg'), display)
return bboxes, scores, class_ids
Testing on KR260Below is the test video mentioned at the beginning.
We opened the.ipynb file on the KR260 via a web browser.
We checked the input/output tensors of the model converted with PyTorch, ensuring (1, 416, 416, 3) → ((1, 52, 52, 85) (1, 26, 26, 85) (1, 13, 13, 85)).
The YOLOX model detected 80 categories of COCO objects.
We compared the detection speed between YOLOX-nano and the old YOLOv3-tiny. The execution environment was the same DPU (B4096).
- DPU execution time…0.1168→0.0154, approximately 1/8 detection time
- CPU post-processing time…0.1303→0.0303, approximately 1/4 detection time
- Total processing fps…3.30→18.6, approximately 5 times the speed
YOLOX-nano Results
Details of detected objects: [49, 60]
Pre-processing time: 0.0080 seconds
DPU execution time: 0.0154 seconds
Post-process time: 0.0303 seconds
Total run time: 0.0537 seconds
Performance: 18.63 FPS
(array([[ 458.1155, 125.8079, 821.8845, 489.5768],
[ 40.2464, 0. , 1239.7537, 720. ]]),
array([0.5618, 0.1179]),
array([49, 60]))
YOLOv3-tiny Results
Details of detected objects: [49, 60]
Pre-processing time: 0.0560 seconds
DPU execution time: 0.1168 seconds
Post-process time: 0.1303 seconds
Total run time: 0.3030 seconds
Performance: 3.30 FPS
(array([[ 157.7307, 455.4164, 434.6538, 812.3395],
[ 49.6795, 66.1538, 658.0765, 1213.8462]], dtype=float32),
array([0.2461, 0.7143], dtype=float32),
array([49, 60], dtype=int32))
Applying YOLOX to 360° Object DetectionWe also applied the YOLOX to 360° live streaming object detection. The actual program can be found on the following GitHub repository:
sudo su
source /etc/profile.d/pynq_venv.sh
cd /src/yolox-test/
python3 app_gst-yolox-real-360-2divide.py
Below is the test video.
The 360° camera(RICOH THETA V) used was old and of USB 2.0 type, resulting in a simple live streaming of about 6fps.
We split the 1920x960 image into two 960x960 images for display.
Implementing object detection with the slow YOLOv3 reduced the frame rate to about 1.5fps.
Changing to YOLOX improved it to about 3.5fps. Further optimization might bring it closer to 6fps.
We conducted object detection using the KR260's DPU, and used the lightweight model "YOLOX-nano" with PyTorch.
We also compared the execution speed of YOLOX and YOLOv3.
In the next project, we measured the speed of object detection using various architectures of the DPU (DPUCZDX8G).
Comments