Self-driving cars cope with high-speed driving scenarios such as highways and trunk roads and the need to make timely responses to various sudden events, such as lane changes of vehicles next to them, sudden failures of vehicles ahead or the sudden appearance of pedestrians ahead, etc. The detection system requires high real-time performance, of which the hardware acceleration platform is the most critical part of the whole system.
The intelligent acceleration platform called Booster designed in this project can accelerate a variety of deep learning algorithms liked Yolov3 or SSD, to help the autonomous driving system quickly and comprehensively cognize complex road conditions, accurately perceive and predict traffic conditions during vehicle travel, and make accurate judgments about the surrounding environment in real time. In addition, it can also be applied in embedded fields such as intelligent security, smart transportation, and industrial production.
System ArchitectureThe entire system is built on the Xilinx ZYNQ heterogeneous computing platform, with the ARM (CPU) side running a Linux system with the PYNQ software framework, the CPU configuring the accelerator and DMA through the AXI-GP interface, and the accelerator data transmission relying on the DMA through the AXI-HP interface, using the AXI-Stream bus. The accelerator system, called Booster System, contains a buffer module, a computation module, and a bus interface module.
The buffer module contains the instruction buffer, the BN parameter buffer, the weight buffer, and the input and output buffer for the image and computation results.
The computation module is the core module of the Booster, which mainly consists of the controller, two feature map buffer, Processing Elements, preprocessing module, accumulation buffer, BN ReLU module (referred to as BR module), and pooling module
Main Features- A generalized accelerator architecture, called Booster. It consists of a buffer part and a computational part. The buffer module is used to configure the parameters of the network model, and the user can directly deploy different CNN network models by configuring the parameters of the network model and instructions describing the model structure to the parameter buffer and instruction buffer of the accelerator system.
- High utilization PE unit design, which can support 3 different types of convolutional computation, including standard convolution, depthwise convolution, and pointwise convolution
- Software and hardware co-design based on quantization-aware training (INT8, W8A8)
Support operators
- Standard Convolution、Depthwise Convolution、Pointwise Convolution
- Max Pooling、Average Pooling
- ReLU
- Batch Normalization
- Concat
- Upsample
- Branch
- Round
Resource
| Resource | Utilization | Available | Utilization / % |
| LUT | 98193 | 230400 | 42.62 |
| LUTRAM | 28190 | 101760 | 27.70 |
| FF | 72421 | 460800 | 15.72 |
| BRAM | 156 | 312 | 50.00 |
| URAM | 96 | 96 | 100.00 |
| DSP | 625 | 1728 | 36.17 |
Power
- Model:MobileNetV1 YoloV3-tiny
- Image Size:640x320
- Dataset:BDD100k
- mAp after training with 7 classes
| Class | mAP@0.5:0.95 <fp32> | mAP@0.5:0.95 <INT8> |
| all | 0.193 | 0.185 |
| person | 0.153 | 0.147 |
| rider | 0.0869 | 0.0886 |
| car | 0.367 | 0.349 |
| bus | 0.274 | 0.259 |
| truck | 0.268 | 0.251 |
| bike | 0.107 | 0.105 |
| motor | 0.0938 | 0.0949 |
Scene I: Downtown street scene application
- Model: MobileNetV1-YoloV3-tiny,image size 640x320x3
- Platform:ZCU104 + PYNQ
- Scene building: Camera readed by OpenCV-python, sequentially performs image Resize, Accelerator Inference, NMS processing and Box display.
Scene II: Highway scene application
- Model: MobileNetV1-YoloV3-tiny,image size 640x320x3
- Platform:ZCU104 + PYNQ
- Scene building: Video readed by OpenCV-python, sequentially performs image Resize, Accelerator Inference, NMS processing and Box display.
| Test Work | FPS |
| Video Read Baseline | 67 |
| Camera Read Baseline | 14/90 |
| Booster | 52 |
| Booster+NMS(CPU) | 33 |
| Video+IMG Resize+Booster+NMS | 20 |
| Camera+IMG Resize+Booster+NMS | 10/24 |
- Camera1 is logitech's c270 (720p 30fps) , Camera Read Baseline FPS is 14fps
- Camera2 is See3CAM_CU30_CHL_TC_BX (1080p 60fps),Camera Read Baseline FPS is 90fps
- Booster Frequency is 215MHz,the Bus width of AXI is 32bit
Comments