#projectofthemonth
Modern robot systems are often used in complex environments, for example, they have been widely used to inspect the dangerous sites in many nuclear plants, such robot systems are required to have highly efficient and reliable vision system to overcome the dynamic challenges from the extreme environment. Ideally, such systems should adaptively adjust their hardware and software to meet dynamic mission deadlines and reconfigure the failure parts that caused by radiation, to the optimal performance in real-time. Therefore, this project precisely addresses this challenge and proposes to design a new flexible hardware architecture to enable adaptive support for a variety of DL based video analytics algorithms on embedded devices. To produce lower cost, lower power and higher processing efficiency DL-inference hardware that can be dynamically configured for dedicated application specifications and operating environments, this will require radical innovation in the optimisation of network architecture, software and hardware of current DL techniques. The following Figure shows how the proposed system utilises the advanced capability of Zynq UltraScale+ MPSoC to produce an adaptive video analytics hardware platform for the critical missions need additional performance and reliability.
Our design includes the following parts:
- a number of pre-built hardware designs with VCU and DPUs,
- a python program to manage and switch the hardware designs,
- a Vitis hardware platform with VCU for users to generate custom configurations,
- test programs running on VCU and DPUs.
Note: The code has been uploaded to my github repo: https://github.com/luyufan498/Adaptive-deep-learning-hardware
By using the management program, it could be very convenient to add new configurations and switch them in the run time as shown in the below figure.
Note: due to the current limitations in Vivado 2020.1, Dynamic Function eXchange (DFX) is not fully supported, a manually reboot is required in current version of the project to enable the new configurations. The online switch and auto-trigger features will be supported later.
The pre-built configurations are tested by using Xilinx face detection program from Vitis AI Library.
All the pre-built configurations and tested performance are listed as follows:
- Configuration 1: VCU and no DPU is used
Conf:0dpu frequency: N/A, fps: N/A
- Configuration 2: VCU and 1 DPU is used with various settings
Conf2.1 (1dpu_L): frequency: 150 MHz, fps: 450, power: 7.991W
Conf2.2 (1dpu_M): frequency: 200 MHz, fps: 501, power: 8.875W
Conf2.3 (1dpu_H): frequency: 300 MHz, fps: 744, power: 10.871W
- Configuration 3: VCU and 2 DPUs are used with various settings
Conf3.1 (2dpu_LL): frequencies: 150 and 150 MHz, fps: 811, power:10.859W
Conf3.2 (2dpu_LM): frequencies: 150 and 200 MHz, fps: 877, power:11.99W
The power consumption and performance of the configurations is shown in the Figure below:
User can dynamically switch the hardware configurations according to their requirements of AI tasks in run-time.
Run this project from pre-built images:The design is based on ZCU104 development board. A SD card is required with at least 16 GB memory size and a SD card reader also needed to copy the images.
The pre-built ALL-IN-ONE image can be downloaded here.
Download it and flash your SD card with this image. Boot zcu104 from the SD cards. The design should be ready to work.
If you are using UART to connect to the board, the username and password will be required. The default username and password are root.
When you successfully logged in the system, switch to /mnt/sd-mmcblk0p1 folder:
root@xilinx-zcu104-2020_1:~# cd /mnt/sd-mmcblk0p1
The switching program is ready here, you can run following command to trigger hardware switch:root@xilinx-zcu104-2020_1:/mnt/sd-mmcblk0p1# python3 hw_switch.py hw_conf/
Then, the available configurations will be listed:
1 : Folder: /media/sd-mmcblk0p1/hw_conf/1dpu_L
2 : Folder: /media/sd-mmcblk0p1/hw_conf/1dpu_H
3 : Folder: /media/sd-mmcblk0p1/hw_conf/1dpu_M
4 : Folder: /media/sd-mmcblk0p1/hw_conf/2dpu_LL
5 : Folder: /media/sd-mmcblk0p1/hw_conf/2dpu_LM
6 : Folder: /media/sd-mmcblk0p1/hw_conf/0dpu
Type in index to load hardware confuragtion:
Type in a number to indicate which configuration you want to switch to.
Type in index to load hardware confuragtion:1
{'path': '/media/sd-mmcblk0p1/hw_conf/1dpu_L', 'boot': '/media/sd-mmcblk0p1/hw_conf/1dpu_L/BOOT.BIN', 'xclbins': ['/media/sd-mmcblk0p1/hw_conf/1dpu_L/dpu.xclbin']}
clean previous conf.....
copy new conf.....
NOTE: currently, users have to reboot the system to finish the hardware switch.
Do you want to reboot now?(y/n)
Type in Y/N
to reboot immediately or later. After rebooting the system, the new configuration will be enabled
To test the VCU, you will need a monitor and an USB camera. We used See3CAM_CU30 included in ZCU104 development kit in our tests
Connect the camera to zcu104 through USB port.
Connect the monitor to zcu104 through DP port.
To test the output of VCU, please use following command:
root@xilinx-zcu104-2020_1: gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,width=1280,height=720,framerate=30/1 ! kmssink bus-id=fd4a0000.zynqmp-display fullscreen-overlay=1
You will see output of the camera on the monitor.
The Xilinx AI library is required to be installed before using DPU. You can follow the instructions on Xilinx AI github to install it. We also prepared a video guide for this. If you are using ssh to connect to the board. You can run face detect program by following command:
root@xilinx-zcu104-2020_1:~/Vitis-AI/vitis_ai_library/samples/facedetect# ./test_video_facedetect densebox_320_320 ~/video2.webm -t 8
You should be able to see output as shown in the figure above.
To test the performance, you can run the following command.
./test_performance_facedetect densebox_320_320 test_performance_facedetect.list -t 8 -s 60
The results should be similar to the following outputs:
root@xilinx-zcu104-2020_1:~/Vitis-AI/vitis_ai_library/samples/facedetect# ./test_performance_facedetect densebox_320_320 test_performance_facedetect.list -t 8 -s 60
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1129 23:39:30.294625 4163 benchmark.hpp:175] writing report to <STDOUT>
I1129 23:39:30.304394 4163 benchmark.hpp:201] waiting for 0/60 seconds, 8 threads running
I1129 23:39:40.304859 4163 benchmark.hpp:201] waiting for 10/60 seconds, 8 threads running
I1129 23:39:50.305155 4163 benchmark.hpp:201] waiting for 20/60 seconds, 8 threads running
I1129 23:40:00.305325 4163 benchmark.hpp:201] waiting for 30/60 seconds, 8 threads running
I1129 23:40:10.305497 4163 benchmark.hpp:201] waiting for 40/60 seconds, 8 threads running
I1129 23:40:20.305663 4163 benchmark.hpp:201] waiting for 50/60 seconds, 8 threads running
I1129 23:40:30.305908 4163 benchmark.hpp:210] waiting for threads terminated
FPS=460.996
E2E_MEAN=17340.7
DPU_MEAN=16112.1
Build the hardware platform from scratchThe project starts with hardware design in Vivado. In order to support VCU and Vitis DPU, we have built own hardware platform from scratch instead of using zcu104 base platform.
Important Note:
Due to current limitations in Vivado 2020.1, it is not allowed to export the project with DFX feature in GUI mode. This will be fixed in Vivado 2020.2, to support runtime switching in a future release.
Tools needed:
You will need to have the following tools for building the hardware platform:
- A Linux-based host OS supported by Vitis and PetaLinux (e.g. Ubuntu 20.04)
- Standard GNU build tools (make, etc)
- Vivado 2020.1
- Vitis 2020.1
- PetaLinux 2020.1
1. Build reference project:
The easiest way to build the hardware design is to modify the reference project provided from Xilinx: ZCU104base.
Embedded platform source files can be found on Github.
Download the source file by executing the following command:
$ git clone https://github.com/Xilinx/Vitis_Embedded_Platform_Source.git
Build Vivado project of zcu104base using the following command:
$ cd Vitis_Embedded_Platform_Source/Xilinx_Official_Platforms/zcu104_base
$ make xsa
Once it is done, there is a Vivado project folder generated:
Vitis_Embedded_Platform_Source/Xilinx_Official_Platforms/zcu104_base/vivado
2. This step is to modify this project to support both VCU and Vitis DPU.
Launch Vivado and open the project:
In GUI mode of Vivado, file->project->open to open the project.
Open the block design, you will see layout like this.
Double click the ps_e block to open the edit panel. We need to enable some ports to for connecting VCU module.
PS-PL Configuration -> General -> interrupts -> PL to PS
set IRQ1[0-7] to 1
ImportantNote: the IRQ0 will be used later in Vitis.LEAVE IT UNCONNECTED, DO NOT CAHNGE IT!
PS-PL Configuration -> PS-PL interfaces -> slave Interface -> AXI HP
Enable HP1 HP2
Add two AXI Interconnect modules and change the number of master interfaces to 1. Connect them to HP1 HP2 separately.
Add VCU IP in the block design. Then we use ‘Run connection Automation’ to connect the following AXI interfaces.
M_AXI_DEC0 -> S_AXI_HP1_FPD
M_AXI_DEC1 -> S_AXI_HP1_FPD
M_AXI_ENC0 -> S_AXI_HP2_FPD
M_AXI_ENC1 -> S_AXI_HP2_FPD
M_AXI_MCU ->AXI_HPM0_FPD
Note: The bandwidth for VCU has been limited in this configuration, because we want to leave more bandwidth for DPUs
Connect vcu_0/vcu_resetn to ps_e/pl_resetn0.
Connect vcu_0/vcu_host_interrupt to ps_e/pl_ps_irq1
The DPU generates separate clock by internal PLL, so we need provide reference clock to drive internal PLL. Add a Clock Wizard IP. Modify the settings like the following figure.
Input: 100M
Output:33.3M
Connect pl_clk0 to clk_in1 and clk_out1 to vcu0/pll_ref_clk
3. Set properties of the modules
In order to support Vitis acceleration, we need to set properties of the modules. Those properties will be recognized by Vitis tools later and to be used to connect the customised hardware accelerators. Please find more info here.
In the zcu104_base project. Some of the properties have been pre-set. We need to change those properties to fit our design. Click Axi_interconnect_hp0 block. You can find there is a property called PFM.AXI_PORT
Modify it as shown as follows
S02_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S03_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S04_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S05_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S06_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S07_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S08_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S09_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S10_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S11_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S12_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S13_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S14_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"} S15_AXI {memport "S_AXI_HP" sptag "HPC0" memory "ps_e HPC0_DDR_LOW"}
Click ps_e
block. Modify the PFM.AXI_PORT
to
M_AXI_HPM1_FPD {memport "M_AXI_GP" sptag "" memory ""} S_AXI_HPC1_FPD {memport "S_AXI_HPC" sptag "HPC1" memory "ps_e HPC1_DDR_LOW"} S_AXI_HP0_FPD {memport "S_AXI_HP" sptag "HP0" memory "ps_e HP0_DDR_LOW"}
Click axi_interconnect_0
. There is no property called PFM
. There are two ways to do it.
- Use the following command:
set_property PFM.AXI_PORT {} [get_bd_cells / axi_interconnect_0]
- Or in GUI mode, enable any one of
axi_interconnect_0
interfaces in Platform Interface panel and you will seePFM
property
Set the PFM
property of axi_interconnect_0
to
S02_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S03_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S04_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S05_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S06_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S07_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S08_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S09_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S10_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S11_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S12_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S13_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S14_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"} S15_AXI {memport "S_AXI_HP" sptag "HP1" memory "ps_e HP1_DDR_LOW"}
Set the PFM
property of axi_interconnect_1
to
S02_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S03_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S04_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S05_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S06_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S07_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S08_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S09_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S10_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S11_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S12_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S13_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S14_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"} S15_AXI {memport "S_AXI_HP" sptag "HP2" memory "ps_e HP2_DDR_LOW"}
NOTE: axi_interconnect_0
is connected to HP1
and axi_interconnect_1
is connected to HP2
Now your Vivado project is ready to use. Just save and click “Generate Bitstream” button. When the bitstreams is successfully generated, go file->export->export hardware
. Select Expandable options. Select Pre-synthesis and check include bitstream.
Given a name you like and export, and you should find a xsa
file in your project folder. We will use this XSA to build our Linux system in PetaLinux.
Please follow the guide here to generate PetaLinux project.
Note: you may need to edit the platform name in the initial file, in order to use the pre-set Make file.
After you complete this step, a Vitis hardware platform should be ready to use.
Step 3: Generate ConfigurationsYou can follow this guide to generate the required hardware configurations.
However, if you want to generate it in GUI mode, please use the following steps:
Open Vitis. File->New->Application Project
.
Select your platform and create an empty project. Add your dpu.xo
. (Use this guide for How to generate dpu.xo ).
Add a DPU hardware function by using the lighting button
.
Change the number of DPUs that you want to include.
In order to edit the connections of the interface. We need to edit the settings of the binary_container_1.
In V++ linker options. Add the following flag:
--config prj_config
Create a new file in: <vitis project folder>/Hardware/prj_config
and add your connectivity configuration here
Here is an example of 2DPU config file:
[clock]
#################################################################
# it seems that there are two kind of formats to indicate
# which clock source you are using.
# you can use clock ids, which are set in the PFM property of the clock source,
# or you can use something like freqHz=300000000:DPUCZDX8G_1.aclk.
# I didn't find a document about it,
# so I just made this file based on Xilinx example project.
# Here is a list of clock frequency:
# id 0 --- 150MHz
# id 1 --- 300MHz
# id 2 --- 75MHz
# id 3 --- 100MHz
# id 4 --- 200MHz
# id 5 --- 400MHz
# id 6 --- 600MHz
####################################################################
id=4:DPUCZDX8G_1.aclk
id=5:DPUCZDX8G_1.ap_clk_2
id=0:DPUCZDX8G_2.aclk
id=1:DPUCZDX8G_2.ap_clk_2
[connectivity]
sp=DPUCZDX8G_1.M_AXI_GP0:HPC1
sp=DPUCZDX8G_1.M_AXI_HP0:HP0
sp=DPUCZDX8G_1.M_AXI_HP2:HP3
sp=DPUCZDX8G_2.M_AXI_GP0:HPC0
sp=DPUCZDX8G_2.M_AXI_HP0:HP1
sp=DPUCZDX8G_2.M_AXI_HP2:HP2
Download pre-designed configuration file.
Then save the file and now you are ready to build hardware.
Normally the build process will take 1 to 5 hours depending on the complexity of hardware designs. The some connectivities may not work due to the timing issues and you will not see errors until hours later, so it does takes time to generate hardware configurations.
Note:In Vitis 2020.1, it seems that the tools will not package the image, if there is no C source file. In order to get the BOOT.BIN
(/Hardware/package), you can add an empty main function in your Vitis project. The image will be generated accordingly.
When hardware building is done, binary_container_1.xclbin
and BOOT.BIN
will be generated. Renamebinary_container_1.xclbin
to dpu.xclbin
. Copy dpu.xclbin
and BOOT.BIN
to /mnt/sd-mmcblk0p1/hw_conf/<your conf name>/
. The configuration fileswill be ready to use. The management program (Download it here) can detect new configurations automatically. To test the system, please follow the instruction in "Run this project from pre-built images" section at the beginning of the webpage.
Comments