This project is part 3 of a 5 part series of projects, where we will progressively create AI enabled platforms for the following Tria development boards:
- ZUBoard
- Ultra96-V2
- UltraZed-7EV
These projects can be rebuilt using the source code on github.com:
The following series of Hackster projects describe how the above github repository was created and serves as documentation:
- Part 1 : Tria Vitis Platforms - Building the Foundational Designs
- Part 2 : Tria Vitis Platforms - Creating a Common Platform
- Part 3 : Tria Vitis Platforms - Adding support for Vitis-AI
- Part 4 : Tria Vitis Platforms - Adding support for Hailo-8
- Part 5 : Tria Vitis Platforms - Adding support for ROS2
The motivation of this series of projects is to enable users to create their own custom AI applications.
Introduction - Part IIIIn the previous project ( Part 2 ), we created Vitis platforms that we could augment with accelerators. In this project, we will add Vitis-AI 3.5 functionnality to our common platform using the following steps:
- Adding the Vitis-AI 3.5 yocto recipes
- Adding the VVAS 3.0 yocto recipes
- Creating the "benchmark" overlays
- Creating the "dualcam_dpu" overlays
- Creating firmware recipes for each design
Before we dive into the above steps, it is worth taking a step back to look at the repository that was used as inspiration for this implementation.
I have chosen to mimic the directory structure used by the Kria platforms to do this, since I find them easy to understand and easy to expand on.
Understanding kria-vitis-platformsBefore continuing to expand our own equivalent of the kria-vitis-platforms repository, it is worth understading its contents.
As of this writing, the kria-vitis-platform repository has been updated for 2023.2:
kria-vitis-platforms
├── k26
├── kr260
└── kv260
├── overlays
│ ├── dpu_ip
│ └── examples
│ ├── aibox-reid
│ ├── benchmark
│ ├── defect-detect
│ ├── nlp-smartvision
│ └── smartcam
└── platforms
├── scripts
└── vivado
├── kv260_ispMipiRx_rpiMipiRx_DP
├── kv260_ispMipiRx_vcu_DP
├── kv260_ispMipiRx_vmixDP
└── kv260_vcuDecode_vmixDP
The first level of directories correspond to the platforms (k26, kr260, kv260).
For the KV260 platform, the directory structure divides into two main sections:
- platforms
- overlays
The platforms directory contains the source code to re-generate the Vivado projects and Vitis platforms.
The overlays directory contains accelerator examples that each target one of the platforms.
To recap from ( Part 2 ), Vitis platforms usually include a hardware (Vivado project) and software (bare metal, linux, or other...) component. Note that in our case the software is greyed out, since we will be creating a common Petalinux project, apart from the Vitis platforms. This is the strategy used by the AMD Kria platforms, which we are taking inspiration from.
The most important point to note is that the Vitis platform provides the following information regarding the available resources to the Vitis toolflow:
- clocks (and associated resets)
- interrupts
- AXI connectivity
Here is a graphical representation of the platforms and overlays that we can find for the KV260.
AMD uses a specific naming scheme for their KV260 Vitis Platforms:
- "kv260" : prefix for each platform
- "ispMipiRx" : MIPI capture pipeline for On Semiconductor (IAS) sensor modules
- "rpiMipiRx" : MIPI capture pipeline for Raspberry PI camera modules
- "vcu" : encode/decode VCU
- "vcuDecode" : decode-only VCU
- "DP" : DisplayPort output
- "vmixDP" DisplayPort output with Video Mixer IP (supporting multiple channels)
The "ispMipiRx" capture pipeline is also different on each Vitis platform:
- kv260_ispMipiRx_vcu_DP : color (NV12) pipeline supporting 4K resolution
- kv260_ispMipiRx_rpiMipiRx_DP : color (RGB24) supporting 1080P resolution
- kv260_ispMipiRx_vmixDP : grayscale (Y8) supporting 1280x800 resolution
The following diagrams illustrate the MIPI capture pipelines for each the various Vitis Platforms:
The following diagrams illustrate the contents of the Vitis Overlays:
The "smartcam" and "aibox-reid" overlays include two accelerators:
- DPU (B3136)
- image pre-processing.
The "benchmark" and "nlp-smartvision" overlays include one accelerator:
- DPU (B4096 for bencharmk, B3136 nlp-smartvision)
Notice that for the "benchmark" overlay, the DPU is the largest architecture that fits inside the device's PL resources. This is the configuration that AMD has used to publish performance numbers for the KV260.
The "defect-detect" overlay does not include the DPU engine, but rather the following accelerators:
- preprocess (image pre-processing)
- gaussian OTSU
- CCA
We will want to reproduce the "benchmark" overlay, which contains the largest DPU core that fits in the available resources.
We will also create a "dualcam_dpu" overlay, which includes the DPU that fits in the available resources with the MIPI capture pipeline.
A Note on PL resource availabilityOne of the great features of the DPU engine, is its scalability. The DPU engine has various architectures allowing the user to select the one that fits inside the available resources of the PL.
The largest DPU is the B4096, allows up to 3 instances, and the smallest DPU is the B128.
If we take the case of the ZUBoard, we can fit the B512 inside the ZU1CG device's PL resources. If we try to select a DPU architecture that fits alongside the DualCam MIPI capture pieline, even the B128 does not fit, as shown in the following excel sheet. There are not enough LUTs available.
In order to leave enough PL resources available for the B128 DPU architecture, I had to scale down the MIPI capture pipeline from 2 pixels per clock (supporting 4K resolution) down to 1 pixel per clock (supporting 1080P resolution).
With this modification, the B128 was able to fit inside the design.
This balancing of features with respect to available resources is a typical design trade-off with FPGA and SoC devices.
For this reason, I created the following new Vitis Platforms, in addition to the existing "base" and "dualcam" platforms:
- minimal : same as "base", removing all unnecessary AXI peripherals
- dualcam1 : same as "dualcam", removing all unnecessary AXI peripherals
- dualcam2 : same as "dualcam1", scaling MIPI capture pipeline down to PPC=1
One such unnecessary AXI peripheral is the AXI interrupt controller (AXI_INTC) used to make interrupts available to Vitis. The modified platforms implement this with a direct connection to the PS_PL_IRQ port.
Other unnecessary AXI peripherals include AXI_GPIO for LEDs and push-buttons, Although these AXI peripherals do not consume much resources, when a design gets tight on available resources, every little bit counts !
Recall from the previous project ( Part 2 ) that the AXI interconnect width is defined in the dynamic device tree, specifically the "xlnx, afi-fpga" binding:
In the "dualcam2" platform, we changed the data path from 2 pixels per clock (128-bit-width) to 1 pixels per clock (64-bit-width). This change needs to be reflected in the "afi0" definition, otherwise, these undesired artifacts (vertical bars repeating at 64bit/128bit boundaries) will appear:
The key is understanding which AXI ports (HP0, HP1, etc...) corresponds to the AXI FIFOs defined by the "xlnx, afi-fifo" binding. I have worked this out in the following table:
Therefore, the afi0 definition is modified as follows for the "dualcam2" design:
afi0: afi0 {
compatible = "xlnx,afi-fpga";
config-afi = <0 0>, <1 0>, <2 0>, <3 0>, <4 1>, <5 1>, <6 0>, <7 0>, <8 0>, <9 0>, <10 0>, <11 0>, <12 0>, <13 0>, <14 0xa00>, <15 0x000>;
resets = <&zynqmp_reset116>, <&zynqmp_reset117>, <&zynqmp_reset118>, <&zynqmp_reset119>;
reset-names = "pl0", "pl1", "pl2", "pl3";
};
Adding the Vitis-AI 3.5 yocto recipesSince we want version 3.5 of the Vitis-AI packages, we need to copy the new yocto recipes, as described here:
When using these recipes with Vitis, we need to remove one file, the vart_3.5_vivado.bb recipe, which is meant to be used for a Vivado-only flow:
recipe-vitis-ai/vart/vart-3.5_vivado.bb
These recipes were added to the project-spec/meta-user directory:
We also want to add Vitis Video Analytics SDK (VVAS) functionality, allowing use of HLS accelerated OpenCV functions, and various other signals processing functions.
This SDK allows us to not only accelerate the model inference, but also the pre-processing, in order to achieve Whole Application Acceleration (WAA).
These recipes were also added to the project-spec/meta-user directory:
We start by creating a symbolic link to the DPUCZDX8G_ip_repo_VAI_v3.0 directory in the common/overlays directory (downloaded with the get_dpu_ip.sh script). Then we make a copy of the "benchmark" build scripts from the KV260 example. The scripts to modify for our use case are indicated in pink:
The Zynq-UltraScale+ devices on the ZUBoard, Ultra96-V2, and UltraZed-EV hardware does not have the same amount of PL resources as the K26 on the KV260 hardware. For this reason, the DPU configuration must be changed to reflect the available resources:
The DPU configuration is specified in the following file for each hardware target:
tria-vitis-platforms/{hwcore}/overlays/examples/benchmark/dpu_conf.vh
In this file, the following parameters are specified:
- DPU architecture : B512 (ZUBoard), B2304 (Ultra96-V2), B4096 (UltraZed-EV)
- URAM usage mode : disabled (ZUBoard & Ultra96-V2), enabled (UltraZed-EV)
- RAM usage mode : low
- DSP48 usage : low (ZUBoard & Ultra96-V2), high (UltraZed-EV)
Determining these parameters is mostly a trial and error. Feel free to explore...
The DPU connectivity is specified in the following file for each hardware target:
tria-vitis-platforms/{hwcore}/overlays/examples/benchmark/prj_conf/prj_config_{1dpu|2dpu}
In this file, the following are specified for each DPU instance:
- DPU clk : DPUCZDX8G_{id}.aclk
- DPU clkx2 : DPUCZDX8G_{id}.ap_clk_2
- DPU low speed AXI connectivity : DPUCZDX8G_{id}.M_AXI_GP0
- DPU high speed AXI connectivity #1 : DPUCZDX8G_{id}.M_AXI_HP0
- DPU high speed AXI connectivity #2 : DPUCZDX8G_{id}.M_AXI_HP2
When complete, the build artifacts will be found in the following directory:
overlays/example/benchmark/binary_container_1/sd_card
The following file is required to compile models for our specific DPU architecture (B512, low RAM usage, etc...):
- arch.json : requires to compile models for specific DPU architecture
These are the files that we need to create the firmware for the overlay:
- *_wrapper.bit : bitstream for PL design
- dpu.xclbin : description of accelerators (base address, interrupt, etc...)
Now that we have created the Vitis overlays for each Vitis accelerated design, we can add these accelerated designs as firmware apps, which we will name:
- {vendor}-{hwcore}-{design}
Petalinux provides a command to create a yocto recipe for these firmware apps:
$ petalinux-create -t apps
--template dfx_user_dts -n {firmware}
--enable
--srcuri"{path}/{firmware}.bit
{path}/{firmware}.dtsi
{path}/{firmware}.xclbin
{path}/shell.json"
--force
Using the above command, the following firmware recipes were created in the tria-vitis-platform github repository:
Instead of including a copy of the bitstream and xclbin, symbolic links are used to point to these files in the Vitis Overlay's sd_card directory structure.
For the ZUBoard, the following firmware recipes were created:
- tria-zub1cg-benchmark
- tria-zub1cg-dualcam_dpu
For the Ultra96-V2, the following firmware recipes were created:
- tria-u96v2-benchmark
- tria-u96v2-dualcam_dpu
For the UltraZed-EV, the following firmware recipes were created:
- tria-uz7ev-benchmark
In order to verify the firmware apps, program the SD image for the hardware target's petalinux project, and boot the board.
Login as the "root" user as follows:
zub1cg-sbc-2023-2 login: root
root@zub1cg-sbc-2023-2:~#
If we made it this far, our petalinux project is working. We can now verify our firmware packages, which are referred to as "apps" by xmutil.
We start by querying which "apps" are present:
root@zub1cg-sbc-2023-2:~# xmutil listapps
Accelerator Accel_type Base Base_type #slots Active
tria-zub1cg-base XRT_FLAT tria-zub1cg-base XRT_FLAT (0+0) -1
tria-zub1cg-dualcam XRT_FLAT tria-zub1cg-dualcam XRT_FLAT (0+0) -1
tria-zub1cg-benchmark XRT_FLAT tria-zub1cg-benchmark XRT_FLAT (0+0) -1
tria-zub1cg-dualcam_dpu XRT_FLAT tria-zub1cg-dualcam_dpu XRT_FLAT (0+0) -1
Now that we have verified our common u96v2-sbc-2023.2 image, and the presence of our firmware apps, we can start verifying these apps.
Verifying these apps is an iterative process, due to the complexity of the dynamic device tree content.
Verifying the "benchmark" appWe start with the "benchmark" app, which shares the same device tree content as the "base" app.
root@zub1cg-sbc-2023-2:~# xmutil loadapp tria-zub1cg-benchmark
[ 167.873070] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /fpga-full/firmware-name
[ 167.883198] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /fpga-full/resets
[ 167.893509] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/status
[ 167.903957] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-pcm0/status
[ 167.916048] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-pcm1/status
[ 167.928139] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-card/status
[ 167.940230] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-codec0/status
[ 167.952517] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/afi0
[ 167.962014] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_gpio_0
[ 167.972026] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_gpio_1
[ 167.982033] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_gpio_2
[ 167.992041] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_iic_0
[ 168.001963] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_iic_1
[ 168.011884] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_iic_2
[ 168.021805] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_intc_0
[ 168.031816] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_quad_spi_0
[ 168.042177] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_uartlite_0
[ 168.052533] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/system_management_wiz_0
[ 168.735789] OF: graph: no port node found in /axi/display@fd4a0000
[ 168.736245] zocl-drm axi:zyxclmm_drm: error -ENXIO: IRQ index 32 not found
tria-zub1cg-benchmark: loaded to slot 0
root@zub1cg-sbc-2023-2:~# xmutil desktop_enable
Note that the following WARNING occurs for the case of a working dynamic device tree, so can be ignored:
OF: overlay: WARNING: memory leak will occur if overlay removed, ...
One thing to note is that after loading the benchmark overlay, the content of the /etc/vart.conf
changed to the following:
root@zub1cg-sbc-2023-2:~# cat /etc/vart.conf
firmware: /lib/firmware/xilinx/tria-zub1cg-benchmark/tria-zub1cg-benchmark.xclbin
We can query the active DPU enabled design with the xdputil utility:
root@zub1cg-sbc-2023-2:~# xdputil query
{
"DPU IP Spec":{
"DPU Core Count":1,
"IP version":"v4.1.0",
"generation timestamp":"2023-02-21 21-30-00",
"git commit id":"7d32c41",
"git commit time":2023022121,
"regmap":"1to1 version"
},
"VAI Version":{
"libvart-runner.so":"Xilinx vart-runner Version: 3.5.0-b7953a2a9f60e23efdfced5c186328dd1449665c 2024-09-10-16:52:14 ",
"libvitis_ai_library-dpu_task.so":"Advanced Micro Devices vitis_ai_library dpu_task Version: 3.5.0-b7953a2a9f60e23efdfced5c186328dd1449665c 2023-06-29 03:20:28 [UTC] ",
"libxir.so":"Xilinx xir Version: xir-b7953a2a9f60e23efdfced5c186328dd1449665c 2024-09-09-13:42:04",
"target_factory":"target-factory.3.5.0 b7953a2a9f60e23efdfced5c186328dd1449665c"
},
"kernels":[
{
"AIE Frequency (Hz)":0,
"DPU Arch":"DPUCZDX8G_ISA1_B512_0101000016010200",
"DPU Frequency (MHz)":300,
"IP Type":"DPU",
"Load Parallel":2,
"Load augmentation":"enable",
"Load minus mean":"disable",
"Save Parallel":2,
"XRT Frequency (MHz)":300,
"cu_addr":"0xb0000000",
"cu_handle":"0xaaaabcac0930",
"cu_idx":0,
"cu_mask":1,
"cu_name":"DPUCZDX8G:DPUCZDX8G_1",
"device_id":0,
"fingerprint":"0x101000016010200",
"name":"DPU Core 0"
}
]
}
We can also query the status of the DPU:
root@zub1cg-sbc-2023-2:~# xdputil status
{
"kernels":[
{
"addrs_registers":{
"dpu0_base_addr_0":"0x0",
"dpu0_base_addr_1":"0x0",
"dpu0_base_addr_2":"0x0",
"dpu0_base_addr_3":"0x0",
"dpu0_base_addr_4":"0x0",
"dpu0_base_addr_5":"0x0",
"dpu0_base_addr_6":"0x0",
"dpu0_base_addr_7":"0x0"
},
"common_registers":{
"ADDR_CODE":"0x0",
"AP status":"idle",
"CONV END":0,
"CONV START":0,
"HP_ARCOUNT_MAX":7,
"HP_ARLEN":15,
"HP_AWCOUNT_MAX":7,
"HP_AWLEN":15,
"LOAD END":0,
"LOAD START":0,
"MISC END":0,
"MISC START":0,
"SAVE END":0,
"SAVE START":0
},
"name":"DPU Registers Core 0"
}
]
Further verification requires xmodel files that have been compiled for this specific DPU architecture (DPU B512, low RAM usage,...).
We can also verify that we can unload the benchmark app:
root@aub1cg-sbc-2023-2:~# xmutil desktop_disable
root@zub1cg-sbc-2023-2:~# xmutil unloadapp
[ 50.960995] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/zyxclmm_drm
[ 50.976432] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/axi_quad_spi@a0070000
[ 50.992320] OF: ERROR: memory leak, expected refcount 1 instead of 226, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/interrupt-controller@a0060000
[ 51.009046] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/i2c@a0050000
[ 51.024110] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/i2c@a0040000
[ 51.039171] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/i2c@a0030000
[ 51.054223] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/gpio@a0020000
[ 51.069353] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/gpio@a0010000
[ 51.084479] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/gpio@a0000000
[ 51.099625] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /axi/afi0
[ 51.114232] OF: ERROR: memory leak before free overlay changeset, /axi/axi_quad_spi@a0070000
[ 51.158328] OF: ERROR: memory leak before free overlay changeset, /axi/i2c@a0050000
[ 51.166120] OF: ERROR: memory leak before free overlay changeset, /axi/i2c@a0040000
[ 51.173886] OF: ERROR: memory leak before free overlay changeset, /axi/i2c@a0030000
[ 51.186943] OF: ERROR: memory leak before free overlay changeset, /axi/gpio@a0020000
[ 51.194789] OF: ERROR: memory leak before free overlay changeset, /axi/gpio@a0010000
[ 51.202632] OF: ERROR: memory leak before free overlay changeset, /axi/gpio@a0000000
[ 51.210478] OF: ERROR: memory leak before free overlay changeset, /axi/afi0
remove from slot 0 returns: 0 (Ok)
Note that the following WARNING occurs for the case of a working dynamic device tree, so can be ignored:
OF: ERROR: memory leak before free overlay changeset, ...
Verifying the "dualcam_dpu" appNext, we tackle the more complex app, the "dualcam_dpu" app, which shares the same device tree content as the "dualcam" app.
root@zub1cg-sbc-2023-2:~# xmutil loadapp tria-zub1cg-dualcam-dpu
[ 81.941945] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /fpga-full/firmware-name
[ 81.952074] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /fpga-full/resets
[ 81.962656] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/status
[ 81.973105] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-pcm0/status
[ 81.985196] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-pcm1/status
[ 81.997284] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-card/status
[ 82.009375] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /axi/display@fd4a0000/zynqmp-dp-snd-codec0/status
[ 82.021665] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/afi0
[ 82.031160] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/ap1302_osc
[ 82.041171] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/misc_clk_0
[ 82.051181] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/axi_iic_0
[ 82.061109] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/ias_out0
[ 82.070945] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.083478] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/mipi_csi_portsCAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.097227] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/mipi_csi_port1CAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.110976] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/mipi_csirx_outCAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.124727] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/mipi_csi_port0CAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.138474] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/mipi_csi_inCAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.151963] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_v_frmbuf_wr_0
[ 82.163715] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_v_proc_ss_csc_0
[ 82.175633] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/csc_portsCAPTURE_PIPELINE_v_proc_ss_csc_0
[ 82.188341] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/csc_port1CAPTURE_PIPELINE_v_proc_ss_csc_0
[ 82.201048] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/csc_outCAPTURE_PIPELINE_v_proc_ss_csc_0
[ 82.213573] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/csc_port0CAPTURE_PIPELINE_v_proc_ss_csc_0
[ 82.226275] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_v_proc_ss_csc_0CAPTURE_PIPELINE_mipi_csi2_rx_subsyst_0
[ 82.241585] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.253771] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/scaler_portsCAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.267000] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/scaler_port1CAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.280227] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/sca_outCAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.293022] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/scaler_port0CAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.306250] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_v_proc_ss_scaler_0CAPTURE_PIPELINE_v_proc_ss_csc_0
[ 82.321219] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/GPIO_axi_gpio_0
[ 82.331668] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/system_management_wiz_0
[ 82.342810] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/zocl
[ 82.352306] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/vcap_portsCAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.365360] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/vcap_portCAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 82.378330] OF: overlay: WARNING: memory leak will occur if overlay removed, property: /__symbols__/CAPTURE_PIPELINE_v_frmbuf_wr_0CAPTURE_PIPELINE_v_proc_ss_scaler_0
[ 83.031177] OF: graph: no port node found in /axi/display@fd4a0000
[ 83.523638] debugfs: Directory '2-003c' with parent 'regmap' already present!
[ 83.548455] zocl-drm axi:zyxclmm_drm: error -ENXIO: IRQ index 8 not found
tria-zub1cg-dualcam-dpu: loaded to slot 0
root@zub1cg-sbc-2023-2:~# xmutil desktop_enable
Once again, the content of the /etc/vart.conf
was changed to reflect the active firmware (app):
root@zub1cg-sbc-2023-2:~# cat /etc/vart.conf
firmware: /lib/firmware/xilinx/tria-zub1cg-dualcam-dpu/tria-zub1cg-dualcam-dpu.xclbin
We can query the active DPU enabled design with the xdputil utility:
root@zub1cg-sbc-2023-2:~# xdputil query
{
"DPU IP Spec":{
"DPU Core Count":1,
"IP version":"v4.1.0",
"generation timestamp":"2023-02-21 21-30-00",
"git commit id":"7d32c41",
"git commit time":2023022121,
"regmap":"1to1 version"
},
"VAI Version":{
"libvart-runner.so":"Xilinx vart-runner Version: 3.5.0-b7953a2a9f60e23efdfced5c186328dd1449665c 2024-09-10-16:52:14 ",
"libvitis_ai_library-dpu_task.so":"Advanced Micro Devices vitis_ai_library dpu_task Version: 3.5.0-b7953a2a9f60e23efdfced5c186328dd1449665c 2023-06-29 03:20:28 [UTC] ",
"libxir.so":"Xilinx xir Version: xir-b7953a2a9f60e23efdfced5c186328dd1449665c 2024-09-09-13:42:04",
"target_factory":"target-factory.3.5.0 b7953a2a9f60e23efdfced5c186328dd1449665c"
},
"kernels":[
{
"AIE Frequency (Hz)":0,
"DPU Arch":"DPUCZDX8G_ISA1_B128_0101000002010208",
"DPU Frequency (MHz)":300,
"IP Type":"DPU",
"Load Parallel":2,
"Load augmentation":"disable",
"Load minus mean":"disable",
"Save Parallel":2,
"XRT Frequency (MHz)":300,
"cu_addr":"0xa0020000",
"cu_handle":"0xaaaaf3e7b550",
"cu_idx":0,
"cu_mask":1,
"cu_name":"DPUCZDX8G:DPUCZDX8G_1",
"device_id":0,
"fingerprint":"0x101000002010208",
"name":"DPU Core 0"
}
]
}
Compiling the ModelZooBefore we can use the AMD/Xilinx modelzoo with the specific DPU architecture we included in our design, we need to compile those models.
The Vitis-AI Model Zoo has evolved to support different frameworks over time, as shown in the previous image, and following animation:
The first version of Vitis-AI supported the Caffe & DarkNet frameworks. This support was deprecated at version 2.5.
Support for the PyTorch framework started at version 1.3, and has been the framework with the most momentum.
It is important to note that support for the Zynq UltraScale+ devices was frozen at Vitis-AI 3.0, which is why we are interested in this version for this project.
The Vitis-AI 3.5 release was done exclusively to support the Versal AI Edge devices.
The Vitis-AI 3.0 models can be downloaded from the Xilinx web site with their provided downloader.py python script:
$ cd ~/Avnet_2023_2/
$ git clone --branch v3.0 github.com/Xilinx/Vitis-AI
$ cd Vitis-AI/model_zoo
$ python downloader.py
Each model can be compiled by following the on-line documentation:
https://xilinx.github.io/Vitis-AI/docs/workflow-model-zoo.html
For convenience, I am providing an archive of pre-compiled models for the following specific DPU architectures.
Download the following model archive(s) that correspond to the DPU architectures for your target hardware:
- Vitis-AI 3.0 - B128 modelsvitis-ai-3.0-models.b128-lr.tar.gz ( 2024/10/13 - md5sum = d5d651a2154e645045d0161a7ea2483b )
- Vitis-AI 3.0 - B512 modelsvitis-ai-3.0-models.0-b512-lr.tar.gz ( 2024/10/13 : md5sum = e1b8749730c1e14bed728a7b3fdef29b )
- Vitis-AI 3.0 - B1152 modelsvitis-ai-3.0-models.b1152-hr.tar.gz ( 2024/10/13 - 1a03e3d68d9ad820198a8bbded4e57f8 )
- Vitis-AI 3.0 - B2304 modelsvitis-ai-3.0-models.b2304-lr.tar.gz( 2024/10/13 : md5sum = 14c70db39554c405456c4107f9085274 )
- Vitis-AI 3.0 - B4096 modelsvitis-ai-3.0-models.b4096-lr.1of2.tar.gz ( 2024/1013 : md5sum = c101448c57988206ea1b0e053ba9af64 )vitis-ai-3.0-models.b4096-lr.2of2.tar.gz ( 2024/1013 : md5sum = d3bf7bbd747a6f2e3a6c851f63178497 )
Then extract the archive to the /usr/share/vitis_ai_library
directory. and create a symbolic link to the version that you want to use:
root@zub1cg-sbc-2023-2:~# cd /usr/share/vitis_ai_library
root@zub1cg-sbc-2023-2:/usr/share/vitis_ai_library# tar -xvzf ~/vitis-ai-3.0-models.0-b512-lr.tar.gz
root@zub1cg-sbc-2023-2:/usr/share/vitis_ai_library# ln -sf models.b512-lr models
Installing the Vitis-AI examplesThe Vitis-AI examples can be found in the Vitis-AI repository under the examples directory:
Vitis-AI
├── ...
└── examples
├── ...
├── vai_library
├── ...
├── vai_runtime
└── ...
These can be copied to the root file system of the SD card image.
They also require archives of images and video files, which can be downloaded from the following links:
- vai_library - imagesvitis_ai_library_r3.5.0_images.tar.gz
- vai_library - videosvitis_ai_library_r3.5.0_video.tar.gz
- vai_runtime - images & videosvitis_ai_runtime_r3.5.0_image_video.tar.gz
For convenience, I am providing an archive of pre-compiled examples, including the previous images and videos, that can be downloaded from a single source.
Download the following examples archive to your target hardware's root file system (ie. using SSH):
- https://avnet.me/vitis-ai-3.0-examples(2023/04/04 : md5sum = 0af9ab73387ef8cc0f90e15bddbcbdb4)
Then extract the archive to the /home/root
(~
) directory:
root@zub1cg-sbc-2023-2:~# cd ~
root@zub1cg-sbc-2023-2:~# tar -xvzf vitis-ai-3.0-examples.tar.gz
Notice again that I am providing a link to version 3.0 of the Vitis-AI examples. instead of version 3.5. The Vitis-AI 3.5 examples are essentially the same, containing only differences that replace support the Zynq-UltraScale+ (ZCU102, ZCU104, KV260) for Versal AI Edge (VEK280) hardware, which does not interest us in this case.
Automatically booting the "benchmark" appThe user can configure the image to automatically boot one of the firmware apps. Before attempting to automatically boot an app, make sure it loads successfully (like we did previously), and does not crash the system. Otherwise, your SD image will always automatically crash.
The /etc/dfx-mgrd/daemon.conf
file indicates which firmware app (default_accel) to load at boot in the /etc/dfx-mgrd/default_firmware
.
root@zub1cg-sbc-2023-2:~# cat /etc/dfx-mgrd/daemon.conf
{
"firmware_location": ["/lib/firmware/xilinx"],
"default_accel":"/etc/dfx-mgrd/default_firmware"
}
The file may exist (or not), but can be created as follows:
root@zub1cg-sbc-2023-2:~# cat /etc/dfx-mgrd/default_firmware
cat: /etc/dfx-mgrd/default_firmware: No such file or directory
root@zub1cg-sbc-2023-2:~# echo tria-zub1cg-benchmark > /etc/dfx-mgrd/default_firmware
root@zub1cg-sbc-2023-2:~# cat /etc/dfx-mgrd/default_firmware
tria-zub1cg-benchmark
The change will take effect at the next boot.
root@zub1cg-sbc-2023-2:~# reboot
After boot, we start by querying which "apps" are present:
root@zub1cg-sbc-2023-2:~# xmutil listapps
Accelerator Accel_type Base Base_type #slots Active_slot
tria-zub1cg-base XRT_FLAT tria-zub1cg-base XRT_FLAT (0+0) -1
tria-zub1cg-dualcam XRT_FLAT tria-zub1cg-dualcam XRT_FLAT (0+0) -1
tria-zub1cg-benchmark XRT_FLAT tria-zub1cg-benchmark XRT_FLAT (0+0) 0,
tria-zub1cg-dualcam_dpu XRT_FLAT tria-zub1cg-dualcam_dpu XRT_FLAT (0+0) -1
Notice that the tria-zub1cg-base app has been loaded. The same can be done for any firmware (design) on each platform.
Notice that the tria-zub1cg-benchmark overlay has been loaded.
We can query the active DPU enabled design with the xdputil utility:
root@zub1cg-sbc-2022-2:~# xdputil query
{
...
"DPU Arch":"DPUCZDX8G_ISA1_B512_0101000016010200",
...
}
!!! VERY IMPORTANT !!!
Make certain that the models library corresponds to the loaded design
root@zub1cg-sbc-2023-2:~# cd /usr/share/vitis_ai_library
root@zub1cg-sbc-2023-2:/usr/share/vitis_ai_library# ls -la
models.b512-lr
models.b128-lr
models -> models.b512-lr
If the models does not correspond to the loaded DPU architecture (ie. B512), re-create the symbolic link:
root@zub1cg-sbc-2023-2:/usr/share/vitis_ai_library# rm models
root@zub1cg-sbc-2023-2:/usr/share/vitis_ai_library# ln -sf models.b512-lr models
Executing the Vitis-AI ExamplesThere are too many examples to cover in this section, but we can cover an alternative to the face detection example : face mask detection
root@zub1cg-sbc-2023-2:~# cd Vitis-AI/examples/vai_library/samples/yolov4
root@zub1cg-sbc-2023-2:~/Vitis-AI/examples/vai_library/samples/yolov4# ./test_video_yolov4 face_mask_detection_pt 0
The current version of this project has the following known issues:
- the " benchmark" app for UltraZed-EV does not properly load
I'm working on it...
ConclusionI hope this tutorial helped to understand how to add Vitis-AI 3.5 functionality to your custom platform.
If you would like to have the pre-built SDcard image for this project, please let me know in the comments below.
Revision History2023/11/11 - Preliminary Version
Comments