In early June Avnet announced the ZUBoard, featuring the ZUB1CG, the smallest device in the AMD-Xilinx Zynq UltraScale+ MPSoC family.
This project proposes the same design methodology used by the Kria family, allowing users to dynamically load their various accelerated applications using a single SD image.
Design OverviewThe pre-built image includes the following accelerated apps:
- avnet-zub1cg-benchmark
- avnet-zub1cg-dualcam-dpu
- avnet-zub1cg-ar0144-dual
- avnet-zub1cg-ar0144-single
- avnet-zub1cg-ar1335-single
The avnet-zub1cg-benchmark accelerated app features the Vitis-AI 2.0 samples, implemented with the Deep Learning Processing Unit (DPU).
The avnet-zub1cg-dualcam-dpu accelerated app implements a MIPI capture pipeline, in addition to the DPU.
The avnet-zub1cg-ar0144-dualcam, avnet-zub1cg-ar0144-single, and avnet-zub1cg-ar1335-single apps share the same Vivado-only hardware design that contains the MIPI capture pipeline.
The avnet-zub1cg-dualcam-dpu and avnet-zub1cg-ar0144-dualcam apps configure the design (via device tree) for the following a0144-dual configuration:
The On Semiconductor AP1302 device is an ISP that synchronously captures images from the two AR0144 image sensors, and provides a single side-by-side image on its MIPI interface.
The avnet-zub1cg-ar0144-single app configures the design (via device tree) for the following a0144-single configuration:
The AP1302 ISP captures images from the AR0144 image sensor populated on the right side.
The avnet-zub1cg-ar1335-single app configures the design (via device tree) for the following ar1335-single configuration:
The AP1302 ISP captures images from the AR1335 image sensor populated on the right-side, and implements auto-gain, auto-white-balance, and auto-focus.
A pre-built image is provided with a set of "accelerated apps" that can be dynamically loaded in the programmable logic.
To get started, download the following image, and program to a 16GB or greater microSDcard:
- http://avnet.me/avnet-zub1cg-sbc-2021.2-sdimage
(2022/09/21 - md5sum : 1b021a30f10629acc077aa977f88582d)
Configure the ZUBoard as shown in the following diagram:
The DualCam SYZYGY is optional, and allows to run the stereo examples.
A live tour was given for a subset of the demos during the "Learn Embedded Design with the ZUBoard 1CG" webinar:
After booting the ZUBoard, the list of "accelerated apps" can be queried with the xmutil utility.
$ xmutil listapps
Accelerator Base Type #slots Active
avnet-zub1cg-benchmark avnet-zub1cg-benchmark XRT_FLAT 0 0,
avnet-zub1cg-dualcam-dpu avnet-zub1cg-dualcam-dpu XRT_FLAT 0 -1
avnet-zub1cg-ar0144-dual avnet-zub1cg-ar0144-dual XRT_FLAT 0 -1
avnet-zub1cg-ar0144-single avnet-zub1cg-ar0144-single XRT_FLAT 0 -1
avnet-zub1cg-ar1335-single avnet-zub1cg-ar1335-single XRT_FLAT 0 -1
This output indicates that the "avnet-zub1cg-benchmark" app is loaded by default at boot. This is determined by the following file, which can be modified if desired:
$ cat /etc/dfx-mgrd/default_firmware
avnet-zub1cg-benchmark
Running the "benchmark" demosThe "benchmark" app is analogous to the Kria KV260 "benchmark" app, in the sense that it contains the largest DPU that fits in the device. For the ZUBoard, this is the B512 DPU.
If not done so already, load the "benchmark" app, using the xmutil utility.
$ xmutil unloadapp
$ xmutil loadapp avnet-zub1cg-benchmark
$ xmutil listapps
Accelerator Base Type #slots Active
avnet-zub1cg-benchmark avnet-zub1cg-benchmark XRT_FLAT 0 0,
avnet-zub1cg-dualcam-dpu avnet-zub1cg-dualcam-dpu XRT_FLAT 0 -1
avnet-zub1cg-ar0144-dual avnet-zub1cg-ar0144-dual XRT_FLAT 0 -1
avnet-zub1cg-ar0144-single avnet-zub1cg-ar0144-single XRT_FLAT 0 -1
avnet-zub1cg-ar1335-single avnet-zub1cg-ar1335-single XRT_FLAT 0 -1
We can query the details of the DPU (B512) inside this overlay with the xdputil utility:
$ xdputil query
{
"DPU IP Spec":{
"DPU Core Count":1,
"DPU Target Version":"v1.4.1",
"IP version":"v3.4.0",
"generation timestamp":"2021-12-15 10-30-00",
"git commit id":"706bd10",
"git commit time":2112151029,
"regmap":"1to1 version"
},
"VAI Version":{
"libvart-runner.so":"Xilinx vart-runner Version: 2.0.0-d02dcb6041663dbc7ecbc0c6af9fafa087a789de 2022-09-02-17:50:46 ",
"libvitis_ai_library-dpu_task.so":"Xilinx vitis_ai_library dpu_task Version: 2.0.0-d02dcb6041663dbc7ecbc0c6af9fafa087a789de 2022-01-20 07:11:10 [UTC] ",
"libxir.so":"Xilinx xir Version: xir-d02dcb6041663dbc7ecbc0c6af9fafa087a789de 2022-09-02-17:48:00",
"target_factory":"target-factory.2.0.0 d02dcb6041663dbc7ecbc0c6af9fafa087a789de"
},
"kernels":[
{
"DPU Arch":"DPUCZDX8G_ISA0_B512_01000020F6012200",
"DPU Frequency (MHz)":300,
"IP Type":"DPU",
"Load Parallel":2,
"Load augmentation":"enable",
"Load minus mean":"disable",
"Save Parallel":2,
"XRT Frequency (MHz)":300,
"cu_addr":"0xa0000000",
"cu_handle":"0xaaab11c02d30",
"cu_idx":0,
"cu_mask":1,
"cu_name":"DPUCZDX8G:DPUCZDX8G_1",
"device_id":0,
"fingerprint":"0x1000020f6012200",
"name":"DPU Core 0"
}
]
}
Notice that we have one kernel of type DPU with the B512 architecture:
"DPU Arch":"DPUCZDX8G_ISA0_B512_01000020F6012200",
"DPU Frequency (MHz)":300,
Before running the demos, we need to verify that we are using the B512 version of the pre-compiled ModelZoo, as shown below:
$ cd /usr/share/vitis_ai_library/
$ ls -la
total 40
drwxr-xr-x 6 root root 4096 Sep 16 2022 .
drwxr-xr-x 84 root root 4096 Mar 9 12:34 ..
lrwxrwxrwx 1 root root 11 Mar 9 12:34 models -> models.b512
drwxr-xr-x 147 root root 12288 Sep 16 2022 models.b128
drwxr-xr-x 223 root root 12288 Sep 16 2022 models.b512
drwxr-xr-x 5 root root 4096 Mar 9 12:34 samples
drwxr-xr-x 58 root root 4096 Mar 9 12:34 test
There are several Vitis-AI demos available, which are available in the "~/Vitis-AI/demo/Vitis-AI-Library" directory.
Face Detection
The face detection demo can be run as follows:
$ cd ~/Vitis-AI/demo/Vitis-AI-Library/samples/facedetect
$ ./test_video_facedetect densebox_640_360 0
Pose Estimation
The pose detection demo can be run with the USB camera as follows:
$ cd ~/Vitis-AI/demo/Vitis-AI-Library/samples/posedetect
$ ./test_video_posedetect_with_ssd 0
The pose detection demo can be run with the locally provided video as follows:
$ ./test_video_posedetect_with_ssd ../../../VART/pose_detection/video/pose.mp4
The Vitis-AI demos all include source code and can be modified for your needs, as shown in the following examples:
Face Detection with Tracking
The face detection demo, augmented with centroid-based object tracking, can be run as follows:
$ cd ~/vitis_ai_cpp_examples/facedetectwithtracking
$ ./test_video_facedetectwiothtracking 0
More details on how this custom example was created can be found here:
Face Detection with Head Pose Estimation
The face detection demo, augmented with face landmarks and head pose, can be run as follows:
$ cd ~/vitis_ai_cpp_examples/facedetectwithheadpose
$ ./test_video_facedetectwithheadpose 0
More details on how this custom example was created can be found here:
License Plate Recognition
A multiple neural inference example has been implemented for the recognition of asian license plates, and can be run as follows:
$ cd ~/vitis_ai_cpp_examples/platerecognition
$ ./test_video_platerecognition ./video/plate_recognition_video.mp4
More details on how this example was created can be found here:
3D Object Detection
A more advanced example showcasing 3D object detection with lidar point cloud data can be run as follows:
$ cd ~/xilinx_developer/ppdemo/
$ ./demo ./ppd/vlist.txt ./ppd/ 3
More details on how this example was created can be found here:
Multi-Task example
Another more advanced example showcasing a multi-task model (common backbone, multiple heads) can be run as follows:
$ cd ~/Vitis-AI/demo/Vitis-AI-Library/apps/multitask_v3_quad_windows
$ ./multitaskv3_quad_windows_x d58cbda2-97976be7__640x360.avi -t 4
Face Applications
The pre-built image also includes examples written in python, which can be leveraged for rapidly prototyping your own ideas.
The face applications webserver can be run as follows:
$ cd ~/vitis_ai_python_examples/webserver
$ python3 webserver.py
Once executing, a webserver can be used to display the page at the ZUBoard's IP address:
Custom Model Training
The programmable nature of the ZUBoard allows users to train and deploy their own custom models with Vitis-AI.
As an example, we have gone through the exercise of creating a custom dataset for the dobble card game, and trained a classification model with TensorFlow.
The trained model was then deployed for inference using Vitis-AI:
The dobble classification demo is part of the ZUBoard pre-built image, and can be run as follows:
$ cd ~/dobble_classification
$ python3 dobble_detect_live.py
The "dualcam_dpu" app implements the B128 version of the DPU, along with a MIPI capture capture, configured for the "dual_ar0144".
If not done so already, load the "dualcam_dpu" app, using the xmutil utility.
$ xmutil unloadapp
$ xmutil loadapp avnet-zub1cg-dualcam-dpu
$ xmutil listapps
Accelerator Base Type #slots Active
avnet-zub1cg-benchmark avnet-zub1cg-benchmark XRT_FLAT 0 -1
avnet-zub1cg-dualcam-dpu avnet-zub1cg-dualcam-dpu XRT_FLAT 0 0,
avnet-zub1cg-ar0144-dual avnet-zub1cg-ar0144-dual XRT_FLAT 0 -1
avnet-zub1cg-ar0144-single avnet-zub1cg-ar0144-single XRT_FLAT 0 -1
avnet-zub1cg-ar1335-single avnet-zub1cg-ar1335-single XRT_FLAT 0 -1
We can query the details of the DPU (B128) inside this overlay with the xdputil utility:
$ xdputil query
{
"DPU IP Spec":{
"DPU Core Count":1,
"DPU Target Version":"v1.4.1",
"IP version":"v3.4.0",
"generation timestamp":"2021-12-15 10-30-00",
"git commit id":"706bd10",
"git commit time":2112151029,
"regmap":"1to1 version"
},
"VAI Version":{
"libvart-runner.so":"Xilinx vart-runner Version: 2.0.0-d02dcb6041663dbc7ecbc0c6af9fafa087a789de 2022-09-02-17:50:46 ",
"libvitis_ai_library-dpu_task.so":"Xilinx vitis_ai_library dpu_task Version: 2.0.0-d02dcb6041663dbc7ecbc0c6af9fafa087a789de 2022-01-20 07:11:10 [UTC] ",
"libxir.so":"Xilinx xir Version: xir-d02dcb6041663dbc7ecbc0c6af9fafa087a789de 2022-09-02-17:48:00",
"target_factory":"target-factory.2.0.0 d02dcb6041663dbc7ecbc0c6af9fafa087a789de"
},
"kernels":[
{
"DPU Arch":"DPUCZDX8G_ISA0_B128_01000020E2012208",
"DPU Frequency (MHz)":300,
"IP Type":"DPU",
"Load Parallel":2,
"Load augmentation":"disable",
"Load minus mean":"disable",
"Save Parallel":2,
"XRT Frequency (MHz)":300,
"cu_addr":"0xa0020000",
"cu_handle":"0xaaab00237970",
"cu_idx":0,
"cu_mask":1,
"cu_name":"DPUCZDX8G:DPUCZDX8G_1",
"device_id":0,
"fingerprint":"0x1000020e2012208",
"name":"DPU Core 0"
}
]
}
Notice that we have one kernel of type DPU with the B128 architecture:
"DPU Arch":"DPUCZDX8G_ISA0_B128_01000020E2012208",
"DPU Frequency (MHz)":300,
Before running the demos, we need to verify that we are using the B128 version of the pre-compiled ModelZoo, as shown below:
$ cd /usr/share/vitis_ai_library/
$ ls -la
total 40
drwxr-xr-x 6 root root 4096 Sep 16 2022 .
drwxr-xr-x 84 root root 4096 Mar 9 12:34 ..
lrwxrwxrwx 1 root root 11 Mar 9 12:34 models -> models.b512
drwxr-xr-x 147 root root 12288 Sep 16 2022 models.b128
drwxr-xr-x 223 root root 12288 Sep 16 2022 models.b512
drwxr-xr-x 5 root root 4096 Mar 9 12:34 samples
drwxr-xr-x 58 root root 4096 Mar 9 12:34 test
$ rm models
$ ln -sf models.b128 models
$ ls -la
total 40
drwxr-xr-x 6 root root 4096 Mar 9 13:05 .
drwxr-xr-x 84 root root 4096 Mar 9 12:34 ..
lrwxrwxrwx 1 root root 11 Mar 9 13:05 models -> models.b128
drwxr-xr-x 147 root root 12288 Sep 16 2022 models.b128
drwxr-xr-x 223 root root 12288 Mar 9 13:05 models.b512
drwxr-xr-x 5 root root 4096 Mar 9 12:34 samples
drwxr-xr-x 58 root root 4096 Mar 9 12:34 test
Stereo Face Detection
The dual inference example, or stereo face detection, can be run as follows:
$ cd ~/avnet_dualcam_python_examples
$ python3 avnet_ar0144_dual_stereo_face_detection.py
More details on how this example was created can be found here:
Going FurtherTo learn more about the ZUBoard, please watch the "Learn Embedded Design with the ZUBoard 1CG" webinar:
Don't have a ZUBoard ?If you have an Ultra96-V2 board (with or without the dualcam mezzanine), you can run these sames designs on the u96v2 image:
- http://avnet.me/avnet-u96v2-sbc-2021.2-sdimage
(2022/09/21 - md5sum : a04ecf831b4e654f2d13e6641b92a02c)
Reuse the same instructions, but swap out the following when using the Ultra96-V2 board:
- zub1cg => u96v2
- b512 => b2304
- b128 => b1152
Please let me know in the comments below if you are using the Ultra96-V2 based design.
ConclusionI hope this tutorial, with its pre-built SD card image, will help you to get your custom AI applications up and running quickly on the ZUBoard.
If there are any other accelerated apps you would like to see on ZUBoard, please share your thoughts in the comments below.
Revision History2022/09/23
Update project image. Add instructions to run multi-task-v3 example.
2022/09/21
Update SD image for ZUBoard, and add SD image for Ultra96-V2.
2022/09/15
Add instructions and videos on how to run demos for following accelerated apps:
- avnet-zub1cg-benchmark
- avnet-zub1cg-dualcam-dpu
Updated SD card image
2022/09/06
Preliminary Version, with recorded video covering following accelerated apps
- avnet-zub1cg-benchmark
- avnet-zub1cg-dualcam-dpu
Comments