Please note that there is a more recent version of this project:
Vitis-AI 1.4 Flow for Avnet VITIS Platforms
IntroductionAvnet recently released Vitis 2019.2 platforms for several of their hardware platforms. These platforms also support the Vitis-AI flow from Xilinx.
The Xilinx Vitis-AI repository ( github.com/Xilinx/Vitis-AI ) provides an excellent tutorial called DPU-TRD on targeting the DPU AI engine to a custom Vitis platform.
This guide is part 1 of 2 which provides detailed instructions for targeting the Xilinx Vitis-AI 1.1 flow to the following Avnet Vitis 2019.2 platforms:
- Ultra96-V2 Development Board
- UltraZed-EV SOM (7EV) + FMC Carrier Card
- UltraZed-EG SOM (3EG) + IO Carrier Card
- UltraZed-EG SOM (3EG) + PCIEC Carrier Card
Part 1 covers the use of the existing Avnet Vitis platforms to target the Vitis-AI 1.1. DNNDK API based application examples.
Part 2 will cover how to modify the Avnet Vitis platforms in order to target the Vitis-AI 1.1 VART based application examples.
Once the tools have been setup, there are five (5) main steps to targeting an AI applications to one of the Avnet platforms:
- 1 - Build the Hardware Design
- 2 - Compile the Model from the Xilinx AI Model Zoo
- 3 - Build the AI applications
- 4 - Create the SD card content
- 5 - Execute the AI applications on hardware
IMPORTANT NOTE : The Ultra96-V2 Development Board requires a PMIC firmware update. See section "Known Issues - Ultra96-V2 PMIC firmware update" below for more details.
Setup - Install the Xilinx ToolsThis project requires the following tools:
- Vitis 2019.2 Unified Software Platform
- Docker
- Vitis-AI v1.1
Refer to Xilinx Vitis Unified Software Platform for instructions on installing Vitis 2019.2 on your linux machine.
Refer to Install Docker for instructions on installing Docker on your linux machine.
Next, clone v1.1 of the the Vitis-AI git repository.
1. Clone Xilinx’s Vitis-AI github repository:
$ git clone https://github.com/Xilinx/Vitis-AI
$ cd Vitis-AI
$ git checkout v1.1
$ export VITIS_AI_HOME="$PWD"
Setup - Install the Avnet Vitis platformsThis guide can be used for any Vitis platform, which will be denoted by {platform}.
For example
- ULTRA96V2 : Ultra96-V2 Development Board
- UZ7EV_EVCC : UltraZed-EV SOM (7EV) + FMC Carrier Card
- UZ3EG_IOCC : UltraZed-EG SOM (3EG) + IO Carrier Card
- UZ3EG_PCIEC : UltraZed-EG SOM (3EG) + PCIEC Carrier Card
1. Download the Vitis platform for the appropriate board using one of the links below, and extract to the hard drive of your linux machine:
- ULTRA96V2 : http://avnet.me/ultra96v2-vitis-2019.2
- UZ7EV_EVCC : http://avnet.me/uz7ev-evcc-vitis-2019.2
- UZ3EG_IOCC : http://avnet.me/uz3eg-iocc-vitis-2019.2
- UZ3EG_PCIEC : http://avnet.me/uz3eg-pciec-vitis-2019.2
2. Specify the location of the Vitis platform, by creating the SDX_PLATFORM environment variable that specified to the location of the.xpfm file
For the ULTRA96V2 platform, this should look similar to the following:
$ export SDX_PLATFORM=/home/Avnet/vitis/platform_repo/ULTRA96V2/ULTRA96V2.xpfm
For the UZ7EV_EVCC platform, this should look similar to the following:
$ export SDX_PLATFORM=/home/Avnet/vitis/platform_repo/UZ7EV_EVCC/UZ7EV_EVCC.xpfm
For the rest of this document, the platform denoted will be denoted by {platform}.
Replace all instances of “{platform}” with the appropriate platform name, such as “ULTRA96V2”.
Step 1 - Build the Hardware ProjectThe creation of the Hardware Project is well documented on Xilinx’s Vitis-AI github repository, specifically the DPU-TRD section.
https://github.com/Xilinx/Vitis-AI/tree/v1.1/DPU-TRD
DPU TRD Vitis Flow
https://github.com/Xilinx/Vitis-AI/blob/v1.1/DPU-TRD/prj/Vitis/README.md
1. Make a copy of the DPU-TRD directory for your platform:
$ cd $VITIS_AI_HOME
$ cp -r DPU-TRD DPU-TRD-{platform}
$ export TRD_HOME=$VITIS_AI_HOME/DPU-TRD-{platform}
$ cd $TRD_HOME/prj/Vitis
Depending on which platform you are targeting, following the instructions in one of the next two sections.
Step 1.1 – Build the Hardware Project for the ULTRA96V2, UZ3EG_IOCC, and UZ3EG_PCIEC platforms1. Edit the dpu_conf.vh file, to specify the architecture and configuration of the DPU, according to the available resources on the Vitis platform
$ cd $TRD_HOME/prj/Vitis
$ vi dpu_conf.vh
Target a DPU with B2304 architecture, by making the following changes to the dpu_conf.vh file.
//`define B4096
`define B2304
Leave all other parameters the same
2. Edit the config_file/prj_config file, to specify the connectivity of the DPU core
$ vi config_file/prj_config
First, specify the number of DPU cores to instantiate in the design as 1.
[connectivity]
...
nk=dpu_xrt_top:1
Specify which frequencies to use for the 1x and 2x clocks
The Avnet Vitis platforms (ULTRA96V2, UZ3EG_IOCC, and UZ3EG_PCIEC) have the following clocks defined in their hardware design:
We will use the 150MHz & 300MHz clocks to connect the DPU.
[clock]
id=0:dpu_xrt_top_1.aclk
id=1:dpu_xrt_top_1.ap_clk_2
NOTE : the dpu_xrt_top_1.ap_clk_2 must be 2X the frequency of dpu_xrt_top_1.aclk
In order to connect up the DPU core, we also need to specify which AXI interconnects to use.
[connectivity]
sp=dpu_xrt_top_1.M_AXI_GP0:HPC0
sp=dpu_xrt_top_1.M_AXI_HP0:HP0
sp=dpu_xrt_top_1.M_AXI_HP2:HP1
NOTE : the same port can be specified twice, in which case additional AXI interconnect will be added if needed.
Leave the other settings the same.
3. Build the DPU enabled hardware design
$ make KERNEL=DPU DEVICE={platform}
The make will build the individual DPU core, then build the complete hardware project.
The Vivado project will be located in the following directory:
DPU-TRD-{platform}/prj/Vitis/binary_container_1/link/vivado/vpl/prj/prj.xpr
4. The output binaries will be located in the following directory:
$ tree binary_container_1/sd_card
├── BOOT.BIN
├── dpu.xclbin
├── image.ub
├── README.txt
└── {platform}.hwh
NOTE : The.hwh file contains details about the hardware implementation, and will be used during model compilation.
Step 1.2 – Build the Hardware Project for the UZ7EV_EVCC platform1. Edit the dpu_conf.vh file, to specify the architecture and configuration of the DPU, according to the available resources on the Vitis platform
$ cd $TRD_HOME/prj/Vitis
$ vi dpu_conf.vh
Keep the DPU architecture set to B4096.
`define B4096
Target usage of Ultra-RAM (specific to EV devices), by making the following changes to the dpu_conf.vh file.
//`define URAM_DISABLE
`define URAM_ENABLE
Target high RAM usage, by making the following changes to the dpu_conf.vh file.
//`define RAM_USAGE_LOW
`define RAM_USAGE_HIGH
Target high DSP48 usage, by making the following changes to the dpu_conf.vh file.
//`define DSP48_USAGE_LOW
`define DSP48_USAGE_HIGH
Leave the other parameters the same
2. Edit the config_file/prj_config file, to specify the connectivity of the DPU (and optionally SFM) cores
$ vi config_file/prj_config
First, specify the number of DPU cores to instantiate in the design as 1.
[connectivity]
...
nk=dpu_xrt_top:1
Specify which frequencies to use for the 1x and 2x clocks
The Avnet Vitis platforms (UZ7EV_EVCC) have the following clocks defined in their hardware design:
We will use the 200MHz & 400MHz clocks to connect the two (2) DPU cores
[clock]
id=4:dpu_xrt_top_1.aclk
id=5:dpu_xrt_top_1.ap_clk_2
NOTE : the dpu_xrt_top_1.ap_clk_2 must be 2X the frequency of dpu_xrt_top_1.aclk
In addition, specify the 200MHz clock to connect the SFM.
id=4:sfm_xrt_top_1.aclk
In order to connect up the two (2) DPU cores (and SFM core), we also need to specify which AXI interconnects to use.
[connectivity]
sp=dpu_xrt_top_1.M_AXI_GP0:HPC0
sp=dpu_xrt_top_1.M_AXI_HP0:HP0
sp=dpu_xrt_top_1.M_AXI_HP2:HP1
NOTE : the same port can be specified twice, in which case additional AXI interconnect will be added if needed.
Leave the other settings the same.
3. Build the DPU core + SFM core
$ make KERNEL=DPU_SM DEVICE={platform}
The make will build the individual DPU (and SFM) core(s), then build the complete hardware project.
The Vivado project will be located in the following directory:
DPU-TRD-{platform}/prj/Vitis/binary_container_1/link/vivado/vpl/prj/prj.xpr
4. The output binaries will be located in the following directory:
$ tree binary_container_1/sd_card
├── BOOT.BIN
├── dpu.xclbin
├── image.ub
├── README.txt
└── {platform}.hwh
NOTE : The.hwh file contains details about the hardware implementation, and will be used during model compilation.
Step 2 - Compile the Models from the Xilinx Model ZooThe Xilinx Model Zoo is a repository of free pre-trained deep learning models, optimized for inference deployment on Xilinx™ platforms.
This project will concentrate on the models for which examples applications have been provided. It is important to know the correlation between model and application. This table includes a non-exhaustive list of application that were verified with corresponding models from the model zoo.
1. The first step is to download the pre-trained models from the Xilinx Model Zoo:
$ cd $VITIS_AI_HOME/AI-Model-Zoo
$ source ./get_model.sh
This will download version 1.1 of the model zoo ( all_models_1.1.zip )
2. Launch the tools docker from the Vitis-AI directory
$ cd $VITIS_AI_HOME
$ sh -x docker_run.sh xilinx/vitis-ai:latest-cpu
3. When prompted, Read all the license notification messages, and press ENTER to accept the license terms.
4. Within the docker session, launch the "vitis-ai-caffe" Conda environment
$ conda activate vitis-ai-caffe
(vitis-ai-caffe) $
5. Create a modelzoo directory, and copy the hardware handoff (.hwh) file
(vitis-ai-caffe) $ cd DPU-TRD-{platform}
(vitis-ai-caffe) $ mkdir modelzoo
(vitis-ai-caffe) $ cd modelzoo
(vitis-ai-caffe) $ cp ../prj/Vitis/binary_container_1/sd_card/{platform}.hwh .
6. Use the dlet tool to generate your.dcf file
(vitis-ai-caffe) $ dlet -f {platform}.hwh
7. The previous step will generate a dcf with a name similar to dpu-11-18-2019-18-45.dcf.Rename this file to {platform}.dcf
(vitis-ai-caffe) $ mv dpu*.dcf {platform}.dcf
8. Create a file named “custom.json” with the following content
{"target": "dpuv2", "dcf": "./{platform}.dcf", "cpu_arch": "arm64"}
9. Create a directory for the compiled models
(vitis-ai-caffe) $ mkdir compiled_output
10. Create a generic recipe for compiling a caffe model, by creating a script named “compile_cf_model.sh” with the following content
model_name=$1
modelzoo_name=$2
vai_c_caffe \
--prototxt ../../AI-Model-Zoo/models/${modelzoo_name}/quantized/deploy.prototxt \
--caffemodel ../../AI-Model-Zoo/models/${modelzoo_name}/quantized/deploy.caffemodel \
--arch ./custom.json \
--output_dir ./compiled_output/${modelzoo_name} \
--net_name ${model_name} \
--options "{'mode': 'normal'}"
11. Compile the caffe model for the resnet50 application, using the generic script we just created:
$ conda activate vitis-ai-caffe
(vitis-ai-caffe) $ source ./compile_cf_model.sh resnet50 cf_resnet50_imagenet_224_224_7.7G
12. Compile the caffe model for the face_detection application, using the generic script we just created:
$ conda activate vitis-ai-caffe
(vitis-ai-caffe) $ source ./compile_cf_model.sh densebox cf_densebox_wider_360_640_1.11G
13. Create a generic recipe for compiling a tensorflow model, by creating a script called “compile_tf_model.sh” with the following content
model_name=$1
modelzoo_name=$2
vai_c_tensorflow \
--frozen_pb ../../AI-Model-Zoo/models/${modelzoo_name}/quantized/deploy_model.pb \
--arch ./custom.json \
--output_dir ./compiled_output/${modelzoo_name} \
--net_name ${model_name}
14. Compile the tensorflow models, using the generic script we just created:
$ conda activate vitis-ai-tensorflow
(vitis-ai-tensorflow) $ source ./compile_tf_model.sh tf_resnet50 tf_resnetv1_50_imagenet_224_224__6.97G
15. Verify the contents of the directory with the tree utility:
(vitis-ai-caffe) $ tree
├── compiled_output
│ ├── cf_densebox_wider_360_640_1.11G
│ │ ├── densebox_kernel_graph.gv
│ │ └── dpu_densebox.elf
│ ├── cf_resnet50_imagenet_224_224_7.7G
│ │ ├── dpu_resnet50_0.elf
│ │ └── resnet50_kernel_graph.gv
│ ├── tf_resnetv1_50_iamgenet_224_224_6.97G
│ ├── dpu_tf_resnet50_0.elf
│ └── tf_resnet50_kernel_graph.gv
├── compile_cf_model.sh
├── compile_tf_model.sh
├── custom.json
├── {platform}.dcf
└── {platform}.hwh
6 directories, 15 files
16. Exit the tools docker
(vitis-ai-caffe) $ exit
Step 3 - Compile the AI ApplicationsThe Vitis-AI 1.1 provides several different APIs, the DNNDK API, and the VART API.
The DNNDK API is the low-level API used to communicate with the AI engine (DPU). This API is the recommended API for users that will be creating their own custom neural networks, targeted to the Xilinx devices.
The Vitis-AI RunTime (VART) API, and Vitis-AI-Library, provide a higher level of abstraction that simplifies development of AI applications. This API is recommended for users wishing to leverage the existing pre-trained models from the Xilinx Model Zoo in their custom applications.
Step 3.1 - Compile the DNNDK based AI ApplicationsThis version of the tutorial only covers the DNNDK based examples, and is based on the documentation available on the Vitis-AI github repository, specifically the “mpsoc” section.
https://github.com/Xilinx/Vitis-AI/tree/v1.1/mpsoc
1. Change to the DPU-TRD-{platform} work directory.
$ cd DPU-TRD-{platform}
2. Download and install the SDK for cross-compilation, specifying a unique and meaningful installation destination (knowing that this SDK will be specific to the Vitis-AI 1.1 DNNDK samples)
$ wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
$ chmod +x sdk.sh
$ ./sdk.sh -d ~/petalinux_sdk_vai_1_1_dnndk
3. Setup the environment for cross-compilation
$ unset LD_LIBRARY_PATH
$ source ~/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux
4. Download and extract the additional DNNDK runtime content to the previously installed SDK
$ wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
$ tar -xvzf vitis-ai-v1.1_dnndk.tar.gz
5. Install the additional DNNDK runtime content to the previously installed SDK
$ cd vitis-ai-v1.1_dnndk
$ ./install.sh $SDKTARGETSYSROOT
6. Make a working copy of the “vitis_ai_dnndk_samples” directory.
$ cp -r ../mpsoc/vitis_ai_dnndk_samples .
7. Download and extract the additional content (images and video files) for the DNNDK samples.
$ wget -O vitis-ai_v1.1_dnndk_sample_img.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk_sample_img.tar.gz
$ tar -xvzf vitis-ai_v1.1_dnndk_sample_img.tar.gz
FACE_DETECTION
8. For the face_detection application, create a model directory and copy the dpu_*.elf model files we previously built
$ cd $TRD_HOME/vitis_ai_dnndk_samples/face_detection
$ mkdir model_for_{platform}
$ cp ../../modelzoo/compiled_output/cf_densebox_wider_360_640_1.11G/dpu_*.elf model_for_{platform}/.
9. For the face_detection application, edit the src/main.cc file to add the following two function calls after the VideoCapture initialization:
VideoCapture camera(0);
if (!camera.isOpenned()) {
cerr << “Open camera error!” << endl;
exit(-1);
}
camera.set(CV_CAP_PROP_FRAME_WIDTH,640);
camera.set(CV_CAP_PROP_FRAME_HEIGHT,480);
10. For the face_detection application, copy the “model_for_{platform}” directory to “model”, then run the “make” command
$ cp -r model_for_{platform} model
$ make
NOTE : You could also edit the build.sh script to add support for the new {platform}. This is left as an optional exercise to the user. If you prefer to modify the build.sh script to add your plaform, refer to the pre-built solutions for an example of how this was done.
RESNET50
11. For the resnet50 application, create a model directory and copy the dpu_*.elf model files we previously built
$ cd $TRD_HOME/vitis_ai_dnndk_samples/resnet50
$ mkdir model_for_{platform}
$ cp ../../modelzoo/compiled_output/cf_resnet50_imagenet_224_224_7.7G/dpu_*.elf model_for_{platform}/.
12. For the resnet50 application, copy the “model_for_{platform}” directory to “model”, then run the “make” command
$ cp -r model_for_{platform} model
$ make
TF_RESNET50
13. For the tf_resnet50 application, create a model directory and copy the dpu_*.elf model files we previously built
$ cd $TRD_HOME/vitis_ai_dnndk_samples/tf_resnet50
$ mkdir model_for_{platform}
$ cp ../../modelzoo/compiled_output/tf_resnetv1_50_imagenet_224_224_6.97G/dpu_*.elf model_for_{platform}/.
14. For the tf_resnet50 application, copy the “model_for_{platform}” directory to “model”, then run the “make” command
$ cp -r model_for_{platform} model
$ make
Step 4 - Create the SD card1. Create a “sdcard” directory
$ cd DPU-TRD-{platform}
$ mkdir sdcard
2. Copy the design files (hardware + petalinux) for the DPU design to the “sdcard” directory.
$ cp prj/Vitis/binary_container_1/sd_card/* sdcard/.
3. Copy the applications to the “sdcard” directory
$ cp -r vitis_ai_dnndk_samples sdcard/.
4. Copy the Vitis-AI runtime for DNNDK to the “sdcard/runtime” directory
$ mkdir sdcard/runtime
$ cp -r vitis-ai_v1.1_dnndk sdcard/runtime/.
5. At this point, your “sdcard” directory should have the following contents
$ tree sdcard
6. Copy the contents of the “sdcard” to the boot partition of the scard
7. If applicable (ie. ULTRA96V2), etract the “rootfs.tar.gz” to the second partition of the sdcard
Step 5 - Execute the AI applications on hardware1. Boot the target board with the sdcard that was create in the previous section
2. If prompted for a login, specify “root” as login and password.
3. Navigate to the sdcard folder
a. For the ULRA96V2, this can be done as follows:
$ cd /run/media/mmcblk0p1
b. For the UZ7EV_EVCC, UZ3EG_IOCC, and UZ3EG_PCIEC, this can be done as follows:
$ cd /run/media/mmcblk1p1
4. Copy the dpu.xclbin file to the /usr/lib directory
$ cp dpu.xclbin /usr/lib/.
5. Install the Vitis-AI embedded package
$ cd runtime/vitis-ai_v1.1_dnndk
$ source ./install.sh
If the dpu.xclbin file is not manually copied to the /usr/lib directory, the install.sh script will generate an error message, since it will attempt to copy it from the /mnt directory.
cp: cannot stat ‘/mnt/dpu.xclbin’: No such file or directory
The install.sh script may also fail to install the python support, which is not critical for this tutorial
Warning: pip3 command not found, skip install python support
6. If prompted for a login, again, specify “root” as login and password
7. Re-navigate to the sdcard directory
8. Validate the Vitis-AI board package with the dexplorer utility
$ dexplorer --whoami
[DPU IP Spec]
IP Timestamp : 2020-03-26 13:30:00
DPU Core Count : 1
[DPU Core Configuration List]
DPU Core : #0
DPU Enabled : Yes
DPU Arch : B2304
DPU Target Version : v1.4.1
DPU Freqency : 300 MHz
Ram Usage : Low
DepthwiseConv : Enabled
DepthwiseConv+Relu6 : Enabled
Conv+Leakyrelu : Enabled
Conv+Relu6 : Enabled
Channel Augmentation : Enabled
Average Pool : Enabled
NOTE : Even if you have built the design for frequencies other than 150MHz/300MHz for the DPU, the dexplorer utility will still report 300MHz.
9. Define the DISPLAY environment variable
$ export DISPLAY=:0.0
10. Change the resolution of the DP monitor to 640x480
$ xrandr --output DP-1 --mode 640x480
11. Launch the DNNDK API based sample applications
$ cd vitis_ai_dnndk_samples
a. Launch the face_detection application
$ cd face_detection
$ ./face_detection
b. Press <CTRL-C> to exit the application
<CTRL-C>
$ cd ..
c. Launch the caffe version of the resnet50 application
$ cd resnet50
$ ./resnet50
d. Wait for application to finish, or Press <CTRL-C> to exit
<CTRL-C>
$ cd ..
e. Launch the tensorflow version of the resnet50 application
$ cd tf_resnet50
$ ./tf_resnet50
f. Wait for application to finish, or Press <CTRL-C> to exit
<CTRL-C>
$ cd ..
Solution – Pre-built SD card imagesFor convenience, pre-built SD card images have been created for the following Avnet platforms:
- ULTRA96V2 : http://avnet.me/ultra96v2-vitis-ai-1.1-image
(MD5SUM = 24abc163ea04874f97826d4d28e7ce2e) - UZ7EV_EVCC : http://avnet.me/uz7ev-evcc-vitis-ai-1.1-image
(MD5SUM = c9fff42ef9954252ec2f9eb59b88ac3c) - UZ3EG_IOCC : http://avnet.me/uz3eg-iocc-vitis-ai-1.1-image
(MD5SUM = d4448130a5ea7950552bc41d72e94651) - UZ3EG_PCIEC : http://avnet.me/uz3eg-pciec-vitis-ai-1.1-image
(MD5SUM = b3aa17a4304c554b60bf5e87220352a4)
The following table describes the applications that are provided on the pre-built SD card images, as well as the command used to launch each of them:
NOTE : mmcblk#p1 denotes one of either mmcblk0p1 or mmcblk1p1, depending on which platform is being tested.
1. For the DNNDK API based AI applications, navigate to the application’s directory, and execute the provided command.
The pre-built solutions have been built with the following DPU configurations:
- UZ7EV_EVCC : B2304 (high RAM usage, high DSP48 usage), 200MHz/400MHz
- ULTRA96-V2 : B2304 (low RAM usage, low DSP48 usage), 200MHz/400MHz
- UZ3EG_IOCC : B2304 (low RAM usage, low DSP48 usage), 150MHz/300MHz
- UZ3EG_PCIEC : B2304 (low RAM usage, low DSP48 usage), 150MHz/300MHz
The pre-built solutions provide compiled models for the following two (2) DPU configurations:
- B2304_lr : B2304 DPU with low RAM usage
- B4096_hr : B4096 DPU with high RAM usage
The build.sh scripts have been modified as follows to support the four (4) Avnet platforms.
...
elif [ "$TestBoard" = "ULTRA96V2" ] || [ "$TestBoard" = "UZ3EG_IOCC" ] || [ "$TestBoard" = "UZ3EG_PCIEC" ]; then
if [ -e ./model_for_B2304_lr ]; then
echo "copy B2304 (low RAM usage) model file..."
cp -r model_for_B2304_lr ./model
else
echo "The folder named 'model_for_B2304_lr' does not exist!"
exit 1
fi
elif [ "$TestBoard" = "UZ7EV_EVCC" ]; then
if [ -e ./model_for_B4096_hr ]; then
echo "copy B4096 (high RAM usage) model file..."
cp -r model_for_B4096_hr ./model
else
echo "The folder named 'model_for_B4096_hr' does not exist!"
exit 1
fi
else
...
Known Issues – Ultra96-V2 PMIC firmware updateFor the case of the Ultra96-V2 Development Board, an important PMIC firmware update is required to run all of the AI applications.
Without the PMIC firmware update, the following AI applications will cause periodic peak current that exceeds the default 4A fault threshold, causing the power on reset to assert, and thus the board to reboot.
- adas_detection
- inception_v1_mt
- resnet50_mt
- segmentation
- video_analysis
The PMIC firmware update increases this fault threshold, and prevents the reboot from occurring.
In order to update the PMIC firmware of your Ultra96-V2 development board, refer to the following instructions:
If you are unable to update the PMIC firmware on your Ultra96-V2, but still want to run all of the AI applications, you can make use of the following script (from Xilinx) to reduce the frequency of the DPU:
This script will reduce the frequency of the PL_CLK0 (100MHz) clock source that feeds the Clock Wizard that generates the multiple clock frequencies, available for Vitis.
The recommended setting for the Ultra96-V2 without PMIC firmware update is to reduce the DPU frequencies down to 125MHz/250MHz.
If you have built the design for 150MHz/300MHz, you should use a value of 83%.
If you have built the design for 200MHz/400MHz (ie. pre-built solution), you should use a value of 65%.
1. Execute the following commands after boot:
$ dpu_clk
Real PL0_CLK 100000000
DPU Performance 100.0%
$ dpu_clk 65
$ dpu_clk
Real PL0_CLK 65217391
DPU Performance 65.2%
This will set the PL_CLK0 frequency to 65MHz, and thus the DPU frequencies to 65% of their original values.
Known Issues – adas_detectionAttempting to compile the “yolo” model will result in errors.
(vitis-ai-caffe) $ source ./compile_cf_model.sh yolo dk_yolov3_cityscapes_256_512_0.9_5.46G
NOTE : In version 1.0 of the Xilinx AI Model Zoo, the models had an additional directory called “compiler”, where these edits were already provided. In version 1.1 of the Xilinx AI Model Zoo, this “compiler” directory was removed, so the edits need to be done manually.
dk_yolov3_cityscapes_256_512_0.9_5.46G
When this model is converted from Darknet, during quantization, two additional terms “yolo_height” and “yolo_width” were automatically added in the first “Input” layer of the model.
1. These two lines need to be deleted from “quantized/deploy.prototxt”, for model compilation.
yolo_height: 256
yolo_width: 512
Known Issues – mobilenetAttempting to compile the “mobilenet” model will result in errors.
(vitis-ai-caffe) $ source ./compile_cf_model.sh mobilenet cf_mobilenetv2_imagenet_224_224_0.59G
NOTE : In version 1.0 of the Xilinx AI Model Zoo, the models had an additional directory called “compiler”, where these edits were already provided. In version 1.1 of the Xilinx AI Model Zoo, this “compiler” directory was removed, so the edits need to be done manually.
cf_mobilenetv2_imagenet_224_224_0.59G
For the mobilenet model, the "Softmax" layer must be deleted.
1. These two lines need to be deleted from “quantized/deploy.prototxt”, for model compilation.
layer {
name: "prob"
type: "Softmax"
bottom: "417"
top: "prob"
}
Known Issues – pose_detectionAttempting to compile the “ssd_person” model will result in errors.
(vitis-ai-caffe) $ source ./compile_cf_model.sh ssd_person cf_ssdpedestrian_coco_360_640_0.97_5.9G
NOTE : In version 1.0 of the Xilinx AI Model Zoo, the models had an additional directory called “compiler”, where these edits were already provided. In version 1.1 of the Xilinx AI Model Zoo, this “compiler” directory was removed, so the edits need to be done manually.
cf_ssdpedestrian_coco_360_640_0.97_5.9G
For the SSD models, the last “Reshape”, “Softmax”, “Flatten”, and “DetectionOutput” layers need to be deleted.
1. The following lines need to be deleted from “quantized/deploy.prototxt”, for model compilation.
layer {
name: "mbox_conf_reshape"
type: "Reshape"
bottom: "mbox_conf"
top: "mbox_conf_reshape"
include {
phase: TEST
}
reshape_param {
shape {
dim: 0
dim: -1
dim: 2
}
}
}
layer {
name: "mbox_conf_softmax"
type: "Softmax"
bottom: "mbox_conf_reshape"
top: "mbox_conf_softmax"
include {
phase: TEST
}
softmax_param {
axis: 2
}
}
layer {
name: "mbox_conf_flatten"
type: "Flatten"
bottom: "mbox_conf_softmax"
top: "mbox_conf_flatten"
include {
phase: TEST
}
flatten_param {
axis: 1
}
}
layer {
name: "detection_out"
type: "DetectionOutput"
bottom: "mbox_loc"
bottom: "mbox_conf_flatten"
bottom: "mbox_priorbox"
top: "detection_out"
include {
phase: TEST
}
detection_output_param {
num_classes: 2
share_location: true
background_label_id: 0
nms_param {
nms_threshold: 0.5
top_k: 400
}
code_type: CENTER_SIZE
keep_top_k: 200
confidence_threshold: 0.01
}
}
SSD application code
Furthermore, the SSD application code for the DNNDK based application will not work correctly with the SSD model from v1.1 of the Xilinx AI model zoo.
This issue has been fixed with the following update:
https://github.com/Xilinx/Vitis-AI/commit/7285d5f78cbe4add65e864b46f53ae120d04b6c5
Update pose_detection of dnndk sample to use the same model with VART sample
The following image illustrates the output from the original code (top image) and corrected code (bottom image).
Attempting to compile the “yolo” model will result in errors.
(vitis-ai-caffe) $ source ./compile_cf_model.sh ssd cf_ssdtraffic_360_480_0.9_11.6G
NOTE : In version 1.0 of the Xilinx AI Model Zoo, the models had an additional directory called “compiler”, where these edits were already provided. In version 1.1 of the Xilinx AI Model Zoo, this “compiler” directory was removed, so the edits need to be done manually.
cf_ssdtraffic_360_480_0.9_11.6G
For the SSD models, the last “Reshape”, “Softmax”, “Flatten”, and “DetectionOutput” layers need to be deleted.
1. The following lines need to be deleted from “quantized/deploy.prototxt”, for model compilation.
layer {
name: "mbox_conf_reshape"
type: "Reshape"
bottom: "mbox_conf"
top: "mbox_conf_reshape"
reshape_param {
shape {
dim: 0
dim: -1
dim: 4
}
}
}
layer {
name: "mbox_conf_softmax"
type: "Softmax"
bottom: "mbox_conf_reshape"
top: "mbox_conf_softmax"
softmax_param {
axis: 2
}
}
layer {
name: "mbox_conf_flatten"
type: "Flatten"
bottom: "mbox_conf_softmax"
top: "mbox_conf_flatten"
flatten_param {
axis: 1
}
}
layer {
name: "detection_out"
type: "DetectionOutput"
bottom: "mbox_loc"
bottom: "mbox_conf_flatten"
bottom: "mbox_priorbox"
top: "detection_out"
include {
phase: TEST
}
detection_output_param {
num_classes: 4
share_location: true
background_label_id: 0
nms_param {
nms_threshold: 0.45
top_k: 400
}
code_type: CENTER_SIZE
keep_top_k: 200
confidence_threshold: 0.01
}
}
2. Furthermore, the first “test” layer must be deleted from “quantized/deploy.prototxt”
name:"test"
input:"data"
input_shape{
dim:1
dim:3
dim:360
dim:480
}
The previous deleted lines need to be replaced by the same “Input” layer as the cf_ssdpedestrian_coco_360_360_0.97_5.9G, replacing the “640” width for “480”.
3. Add the following lines to the start of “quantized/deploy.prototxt”
layer {
name: "data"
type: "Input"
top: "data"
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
force_color: true
resize_param {
prob: 1
resize_mode: WARP
height: 360
width: 480
interp_mode: LINEAR
}
}
input_param {
shape {
dim: 1
dim: 3
dim: 360
dim: 480
}
}
}
Known Issues - python applications
The following python application were successfully verified on the ULTRA96V2 platform:
- inception_v1_mt_py
- miniresnet_py
- resnet50_mt_py
However, they will not run on the UZ3EG_IOCC, UZ3EG_PCIEC, and UZ7EV_EVCC, since the pip3 utility is not included in their petalinux project. This issue will be addressed in Part 2 of this tutorial.
Comments