Published April 18, 2022

Radar based 3D Object Detection for mobile robot

Running Center-based Radar and Camera Fusion for 3D Object Detection on KV260

1,113

Radar based 3D Object Detection for mobile robot

Things used in this project

Hardware components

AMD Kria KV260 Vision AI Starter Kit

Texas Instruments IWR6843AOPEVM

USB Webcam ZC-D2, Trademark Teaosiy

Digilent DC Motor/Gearbox (1:19 Gear Ratio): Custom 12V Motor

Digilent Pmod HB5

Digilent Pmod IOXP

Digilent 2x6-pin to Dual 6-pin Pmod Splitter Cable

Software apps and online services

Texas Instruments Code Composer Studio IDE

Texas Instruments MMWAVE-SDK

Texas Instruments Industrial mmWave toolbox

Texas Instruments mmWave Demo Visualizer

nuScenes

Netron

Eclipse IDE for Embedded C/C++ Developers

PyCharm

Story

Introduction

Mobile robots such as vacuum cleaner robots use lidar for advanced spatial planning, path planning, and obstacle detection. lidar has problems with solar radiation near windows, especially in this price segment. Radar sensors have no problem with solar radiation and are now available cheaper than lidar sensors even for the hobby.Basically the current research for Autonomous Vehicle mostly using a combination of 2D cameras and radar sensors. My idea is to have a 2D camera and a short range radar on a mobile robot and utilize one of the state of the art of the mentioned research.

Research, Try-Outs and Decisions

Model pt_centerpoint_astyxThe Model pt_centerpoint_astyx from the Vitis-AI model zoo was my first choice. I spend a lot of time to find out how it works regarding the exact input, outputs and it's meaning. The link to data set doesn't work anymore, but finally I found it on GitHub. I tried to do a retrain with the given source from Xilinx. In the end I couldn't manage to do the retrain, because of several different technical issues. Also the referenced paper to the used neuronal network in the original astyx data set is apparently not public. I had to requested it from the author and got it after some time. Finally I found out that only cars are annotated in the astyx data set with a few samples. So record and annotate new data would be required, which is a huge effort to take. So I decided to reject it.

Model CenterFusion (arXiv:2011.04841v1) CenterFusion is based on CenterTrack (arXiv:2004.01177v2), based on "Objects as Points" (arXiv:1904.07850v2) which claims to be SOTA. Basically CenterFusion is using camera data and radar point cloud trained with nuScenes dataset for 3D object detection. The nuScenes dataset contains at least 10 object classes, so I decided to go for it.

Selecting the radar sensorSparkFun Pulsed Radar Breakout - A111:I had already experimented with this radar sensor using a RPI3. He has an SPI interface which could also be connected to the PMOD of the KV260. It has only one TX and RX antenna. This means to get a point cloud from it the sensor needs be rotated (like LIDAR) or using a multiply sensor array. But in meantime this sensor is no longer in production and available. The successor would be the XC112 Connector Board + XR112 Radar Sensor Board. It can connect up to four XR112. But this is far to expensive and would blow my budget.

TI IWR6843AOP/IWR6843AOPEVM:This sensor has 3 TX and 4RX and is capable to deliver a radar point cloud with the out-of-the box demo firmware on a the evaluation board (IWR6843AOPEVM). The evaluation board is compact enough to mount it on a mobile robot. The data can be retrieved via serial interface. So it can plug in to USB interface of the KV260 and readout by the CPU. Because of this I decided to use this sensor.

Selecting the cameraI got the Kria KV260 Basic Accessory Pack from the contest and had planed to use the AR1335 for this task. But unfortunately the connector cable is to short to mount it in a propitiate way. I went forward with the Digilent Pcam which is similar to the RPI camera. But there was no hardware design available at this time to control and capture frames from the Pcam in Petalinux. So I decided to use a cheap 1080p Webcam usb camera. Maybe some remark: I choose the camera with a FoV of 120 degree which is close to the max FoV of the radar sensor, because I hope it is maybe easier to calibrate later. Mobile robot hardwareI will use the base plates (3D printout) and some of the hardware from Zybot project. To connect the Pmod HB5 and the DC Motors I will use the Pmod IOXP.At the end there will be sand-witches of base plates at the experimental platform, which needs probably extended size because of the KV260 (bigger than Zybo).I faced a big issue here, because I wanted to utilize the i2c0. I tried mapping it to the EMIO, mark as external port, assigned it to the PMDO IO etc. The kernel (Petalinux) recognized it, but couldn't power it up because of some "Power Domain failure". I couldn't trace it's cause and decided to the Xilinix I2C IP. It worked. I could control the Pmod IOXP checked by Logic analyzer.

1 / 2

Implementation

Operating SystemI decided to use PetaLinux (xilinx-k26-starterkit-2021_1). But in order to run successful my compiled model I needed to update the Vitis-AI libraries and tools to version 2.0. For this I created a new PetaLinux project using the provides BSP from the KV260 Wiki and added the Vitis-AI-Recipes. I also replaced the dropbear ssh with openSSH daemon so that remote debugging via Eclipse IDE and PyCharm IDE is working properly.

Capturing radar point cloud data

I flashed the pre-compiled "out of the box" demo (xwr6843AOP_mmw_demo.bin) from the TI Industrial mmWave toolbox and adapted the ROS1 driver from the Ti mmwave_industrial_toolbox to grab the data from UART. To find the right sensor configuration the mmWave Demo Visualizer are used to test directly on PC. The saved configuration parameter can then apply to the sensor via UART on startup.As I already mentioned the IWR6843AOPEVM is plug in one of the KV260 USB ports and powered by them. After the configuration is applied and the sensor is started the radar data can be obtained via UART in the specified format from Ti. Depending on the configuration (without heat map) I could manage to get a frame rate of about 30fps.

Quantization of the CenterFusion ModelThe convolutional backbone of the network reference by the paper is using DLA (Deep Layer Aggregation). So it could be quite a challenge in terms of run time performance for the KV260. I was happy as I saw that are already pre-trained models exist. Unfortunately the pre-trained models referenced by the paper are using mostly "Deformable Convolution" operations which are not supported by the Xilinx DPU-IP. So I had to retrain the model. To avoid stepping in all issues which might be coming up I decided to follow the paper and train and quantize the intermediary models too.

CenterTrackTraining

follow the instructions on the CenterTrack GitHub repository to setup the required environment on top of the Vitis-AI docker container
download, extract and convert the nuScenes dataset as it is described.
change to the source folder and run the object detection training (batch_size, gpus depend on your environment):

python main.py ddd --exp_id nuScenes_3Ddetection_e140 --dataset nuscenes --batch_size 12 --gpus 0 --lr 5e-4 --num_epochs 140 --lr_step 90,120 --save_point 90,120 --dla_node conv

after start the training with tracking:

python main.py tracking,ddd --exp_id nuScenes_3Dtracking --dataset nuscenes --pre_hm --load_model ../models/nuScenes_3Ddetection_e140.pth --shift 0.01 --scale 0.05 --lost_disturb 0.4 --fp_disturb 0.1 --hm_disturb 0.05 --batch_size 12 --gpus 0 --lr 2.5e-4 --save_point 10,20,30,60 --dla_node conv

Quantization

For this step the CenterTrackCustom repository is used.I created a python script "centernet_quant.py" which handles the quantization process like it is described in ug1414 (Xilinx documentation) with nearly the same command line options.

For the calibration run:

python centernet_quant.py ddd --exp_id nuScenes_3Ddetection_e140 --gpus 0 --dataset nuscenes --dla_node conv --data_dir /mnt/dataset --load_model ../models/nuScenes_3Ddetection_e170.pth --batch_size 1 --quant_mode calib --num_iters 500

For test and deploy run:

python centernet_quant.py ddd --exp_id nuScenes_3Ddetection_e140 --gpus 0 --dataset nuscenes --dla_node conv --data_dir /mnt/dataset --load_model ../models/nuScenes_3Ddetection_e170.pth --batch_size 1 --quant_mode test --num_iters 1 --deploy

Benchmark test using PetaLinux (xilinx-k26-starterkit-2021_1, Vitis-AI 2.0):

sudo xmutil unloadapp
sudo xmutil loadapp kv260-dpu-benchmark
Accelerator loaded to slot 0
#Identify the DPU graphs
xdputil xmodel DDLASegDDDTrack.xmodel -l 
#Run benchmark for graph 1 with one thread
xdputil benchmark DLASegDDDTrack.xmodel -i 1 1
FPS= 16.8081 number_of_frames= 1009 time= 60.0305 seconds.
#Run benchmark for graph 7 with one thread
xdputil benchmark DLASegDDDTrack.xmodel -i 7 1
FPS= 17.683 number_of_frames= 1062 time= 60.0577 seconds.

CenterFusionTraining

follow the instructions on the CenterFusion GitHub repository to setup the required environment on top of the Vitis-AI docker container
reuse the nuScenes dataset convert it again as it is described (radar point cloud data is needed this time).
change to the source folder and run the object detection training (batch_size, gpus depend on your environment):

python main.py ddd --exp_id centerfusion --shuffle_train --train_split train --val_split mini_val --val_intervals 1 --nuscenes_att --velocity --batch_size 8 --lr 2.5e-4 --num_epochs 60 --lr_step 50 --save_point 20,40,50 --gpus 0 --not_rand_crop --flip 0.5 --shift 0.1 --pointcloud --radar_sweeps 6 --pc_z_offset 0.0 --pillar_dims 1.0,0.2,0.2 --max_pc_dist 60.0 --load_model ../models/centernet_baseline_e170.pth --dla_node conv

Quantization

For this step the CenterFusionCustom repository is used.I created a python script "centerfusion_quant.py" which handles the quantization process like it is described in ug1414 (Xilinx documentation) with nearly the same command line options.

For the calibration run:

python centerfusion_quant.py ddd --exp_id centerfusion --dataset nuscenes --val_split mini_val --run_dataset_eval --nuscenes_att --velocity --pointcloud --radar_sweeps 6 --max_pc_dist 60.0 --pc_z_offset -0.0 --load_model ../models/centerfusion_e60.pth --flip_test --dla_node conv --data_dir /mnt/dataset --batch_size 1 --quant_mode calib --num_workers 0 --gpus 0

For test and deploy run:

python centerfusion_quant.py ddd --exp_id centerfusion --dataset nuscenes --val_split mini_val --run_dataset_eval --nuscenes_att --velocity --pointcloud --radar_sweeps 6 --max_pc_dist 60.0 --pc_z_offset -0.0 --load_model ../models/centerfusion_e60.pth --flip_test --dla_node conv --data_dir /mnt/dataset --batch_size 1 --quant_mode test --num_workers 0 --gpus 0 --deploy

The quantize and compiled models can be found in the models folder. The DPU architecture is "DPUCZDX8G_ISA0_B4096_MAX_BG2" and available if "kv260-dpu-benchmark" application is loaded. Because of the pytorch operations "aten::clone", "select", "unsqueeze" and "generate_pc_hm_custom" the inference needs to be scheduled in two DPU graphs. Compared to the plain CenterTrack the CPU load will higher because of the custom operaration.

Benchmark test using PetaLinux (xilinx-k26-starterkit-2021_1, Vitis-AI 2.0):

sudo xmutil unloadapp
sudo xmutil loadapp kv260-dpu-benchmark
Accelerator loaded to slot 0
#Identify the DPU graphs
xdputil xmodel DLASecWrapper.xmodel -l 
#Run benchmark for graph 1 with one thread
xdputil benchmark DLASecWrapper.xmodel -i 3 1
FPS= 16.8488 number_of_frames= 1011 time= 60.0041 seconds.
#Run benchmark for graph 7 with one thread
xdputil benchmark DLASecWrapper.xmodel -i 10 1
FPS= 9.77932 number_of_frames= 587 time= 60.0246 seconds.

Result

The benchmark of the quantized model reveals that even with a high use of FPGA resources it can't reach real time (at least 30fps). Assuming there is only one DPU core it will have a latency of probably 3-4 frames after camera/radar frame to have objects detected.

Follow Ups

Clarify the calibration of the model regarding different camera and radar sensors compared to the data set
evaluate the model performances compared to the quantized model
retrain the model with less complex convolutional backbone to increase the inference run time to 30fps - reduction of FPGA resources would be also good to have room for CPU intensive non-ML calculations
when satisfied with the model implement DPU scheduling first in python, c++ later
Idea is to have the radar sensor and "Model-component" ROS2 integrated
Deal with power supply

Conclusion

(in context of the Adaptive Computing Challenge 2021)I have underestimated the effort to have a full functional mobile robot which uses combination of 2D cameras and radar sensors for his tasks within the contest. Especially the Machine Learning part consumes the most of my available time. But still I'll continue to work on this topic because I had the idea long before the contest and the contest pushed me to finally start. The time for the challenge is over and I couldn't finish the project in this time. But I described the work I've done so far, the decisions I made, the issues I faced (also reflect by the source code changes) and will face. I wish I can inspire other people and collect some feedback.

Schematics

Code

torch_script_writer.py

def _write_forward(self, f: Callable, graph: Graph):
    indent_str = 4 * " "
    f.write('\n' + indent_str + "def forward(self, *args):\n")
    indent_str += indent_str
    self._collect_reuse_output(graph)
    for node in graph.nodes:
      forward_str, output_str = self._get_forward_str(node)
      format_forward_str = self._append_indent(indent_str, forward_str)
      f.write(format_forward_str + '\n')

    return_str = indent_str + 'return [{'
    for i, end_tensor in enumerate(graph.end_tensors):
      if i > 0:
        return_str = ', '.join(
            [return_str, str(i) + ':' + self.get_output_tensor_name(end_tensor)])
      else:
        return_str += str(i) + ':' + self.get_output_tensor_name(end_tensor)
       
    f.write(return_str + '}]\n')

Radar based 3D Object Detection for mobile robot

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Research, Try-Outs and Decisions

Implementation

Result

Follow Ups

Conclusion

Custom parts and enclosures

Base plate

Schematics

dummy

Code

torch_script_writer.py

CenterTrackCustom

iwr6843aop

CenterTrack

xilinx-k26-starterkit-2021_1

CenterFusion

CenterFusionCustom

Credits

robar

Comments

Embed the widget on your own site

Radar based 3D Object Detection for mobile robot

Radar based 3D Object Detection for mobile robot

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Research, Try-Outs and Decisions

Implementation

Result

Follow Ups

Conclusion

Custom parts and enclosures

Base plate

Schematics

dummy

Code

torch_script_writer.py

CenterTrackCustom

iwr6843aop

CenterTrack

xilinx-k26-starterkit-2021_1

CenterFusion

CenterFusionCustom

Credits

robar

Comments

Related channels and tags