Introduction
The Deep Neural Network Development Kit
Setting Up
Setting Up the ZCU104
Setting Up the Ultra96
Building and Running Examples
Running the ADAS Example
Modifying the Pose Example to Use a Live Feed
Conclusion
References

Published May 6, 2019 © GPL3+

Machine Learning at the Edge with Xilinx DNN Developer Kit

Programmable logic can accelerate machine learning inference. Let's take a look at how we can use the Xilinx DNNDK to do this.

IntermediateFull instructions provided3 hours11,965

Machine Learning at the Edge with Xilinx DNN Developer Kit

Things used in this project

Hardware components

Tria Technologies Ultra96-V1

Zynq UltraScale+ MPSoC ZCU104

Webcam, Logitech® HD Pro

Software apps and online services

Xilinx Deep Neural Network Development Kit

Story

Introduction

Vision-based machine learning inference is a hot topic, with implementations being used at the edge for a range of applications from vehicle detection to pose tracking and classification and identification of people, objects and animals.

Due to the complexity of convolutional neural networks, implementing machine learning inference can be computationally intensive. This makes achieving high frame rates challenging using traditional computational architectures. Heterogeneous System on Chips like the Zynq and Zynq MPSoC which combine high performance ARM processors with programmable logic offer solution which can significantly accelerate the performance.

The challenge has previously been creating the programmable logic implementation, which is easy to use and work with common machine learning flows e.g. Caffe and TensorFlow.

The Deep Neural Network Development Kit

To address the need to work with common industry frameworks and enable acceleration in programmable logic without the need to implement the entire network from scratch. Deephi (owned by Xilinx) developed the Deep Neural Network Development Kit (DNNDK).

The DNNDK is based on C/C++ APIs and allows us to work with common industry standard frameworks, and with popular networks including VGG, ResNet, GoogLeNet, YOLO, SSD, and MobileNet.

At the heart of the DNNDK, which enables the acceleration of the deep learning algorithms, is the deep learning processor unit (DPU). On our Zynq or Zynq MPSoC system, the DPU resides in the programmable logic. To support different deep learning accelerations there are several different DPUs variants which can be implemented.

Deep learning processor unit (DPU) — source UG1327

The basic stages of deploying a AI/ML application in a Zynq / Zynq MPSoC using DNNDK are:

Compress the neural network model — Takes the network model (prototext), Trained Weights (Caffe) and produces a quantized model which used INT8 representation. To achieve this, a small input training set is also typically required — this contains 100 to 1000 images.
Compile the neural network model — This generates the ELF files necessary for the DPU instantiations. It will also identify elements of the network which are not supported by the DPU for implementation on the CPU.
Create the program using DNNDK APIs — With the DPU Kernels created, we can now build the application which manages the inputs and outputs, performs DPU Kernel life cycle management and DPU Task management. During this stage, we also need to implement network elements not supported by the DPU on the CPU.
Compile the hybrid DPU application — Once the application is ready, we can run the hybrid compiler which will make the CPU code and link it to the ELFs for the DPUs within the programmable logic.
Run the hybrid DPU executable on our target.

Hybrid compilation overview — UG1327

To perform these five steps, DNNDK provides several different tools, which are split between the host and the target.

On the host side, we are offered the following tools:

DECENT — Deep compression tool, preforms the compression of the network model.
DNNC — Deep neural network compiler, performs the network compilation. DNNC has a sub component the DNNAS — deep neural network assembler which generates the ELF files for the DPU.

While on the target side:

N2Cube — This is the DPU run time engine and provides loading of DNNDK applications, scheduling, and resource allocation. Core components of the N2Cube include the DPU driver, DPU loader, and DPU tracer.
DExplorer — Provides DPU info during run time.
DSight — Profiling tool that provides visualization data based on Dtracer information.

Deployment — UG1327

In this project we are going to take a look at how we can get the examples up and running on both the Ultra96 and the ZCU104 board.

Once we have these up and running, we will make some simple modifications to the one of the examples so that we can see it running from a live video feed.

Setting Up

Before we can see our first examples up and running we need to download the DNNDK from the Xilinx Website

Once we have the DNNDK downloaded the next step to extract the compressed file until you can see the contents.

In the unzipped directory you will notice the following directories

Common - Contains a number of images for classification
Host_x86 - The host development tools e.g DECENT & DNNC
DP-8020 / DP-N1 / ZCU102 / ZCU104 / Ultra96 - These are the tool chains and example applications for the reference named boards.

To install the tool chains and the boards we need to first download the linux images for the boards of interest from the Xilinx Deephi website

Once these are downloaded we can un-compress the image and write the image to a SD card for either the Ultra96 or ZCU104 (make sure you use the right image)

To do this I used a program such as Disk Imager or Etcher

Writin the image

Once the board has booted the image we will also need to be able to transfer the downloaded DNNDK examples and tools to the board as it runs.

To do this we will be using MObaXTerm which allows us to transfer information to the target board when it is connected to a network.

MobaXterm when running

The next thing to do is set up the boards, both processes are very similar with a slight adaption for the Ultra96 and its WIFI connection.

Setting Up the ZCU104

The first board we will examine is the ZCU104, one the SD card ih the image is ready. Double check the boot mode switches on the ZCU104 and power on the board.

Ensure the ZCU104 is connected to a Ethernet connection and you will see th boot process complete after a few seconds.

If at any point you are asked for username and password they are both root.

Linux Image when booted on ZCU104 / Ultra96

Once the Linux image is up and running the first thing to do is determine the IP address assigned. We can do this by entering the command

ifconfig

Determining the assigned IP Address

Now the board is up and running and connected to the network we can use mobaXterm to transfer the tools and examples for the ZCU 104

In the mobaXterm enter the command

scp -r /ZCU104 root@<ZCU104 IP address>:~/

This will transfer the files from the development machine to the ZCU104.

Transfering the files

We can then install the DNNDK and its examples using either a SSH or serial terminal.

Using a terminal you will notice a new directory has been installed called ZCU104. Under here you will see a install script install the script using the command

./install.sh

Installing the DNNDK

This will start the installation process as shown below

Installation starting

As part of the installation process you will see the Linux image re-start

ZCU104 Linux image re-starting during DNNDK installation

We can now work with the example applications which are provided for tha particular board.

However, for each example we first need to compile it which is achieved by calling the make function in the example you wish to see.

Making an application

Setting Up the Ultra96

We set up the Ultra96 in a similar manner to that of the ZCU104 however, we need to make a few adaptions for the WIFI.

Once the initial Ultra96 image has booted to be able to install the DNNDK tools and applications we need to first connect to the WIFI.

To do this we need to use a USB hub which is allows us to connect to a mouse and keyboard and connect the Mini DisplayPort output to a suitable monitor.

Power on the Ultra96 and you will see it boot to a desktop environment

DNNDK Image desktop

To connect to the WIFI, click on the menu and select Internet->Wicd Network Manager. This will open the network manage which lists all of the available networks.

Available Networks

Select the one you want and click connect, if there is encryption then it will issue a warning and ask for the password.

Security Configuration

Once the password is entered, click OK and then connect. Connecting to a network might take a few seconds

We can again use mobaXterm to upload the Ultra96 directory and then install the tools and applications on the board as we did for the ZCU104

scp -r /Ultra96 root@<Ultra6 IP address>:~/

Uploading the Ultra96 files

again we install the Ultra96 tools and applications by running the command

./install.sh

We are now ready to start working with our boards. For the rest of this project we will focus on the Ultra96 however using the ZCU104 is identical.

Building and Running Examples

We can run the DNNDK examples on the Ultra96 using the mouse, keyboard and terminal window on the Ultra96. The desktop proves very useful.

As mentioned above to run each example we first need to make it by running Make in the application directory.

However there are other tools we may want to make use of, which come with the DNNDK

We can use dexploerer to obtain information on the cores included in the PL. Entering the command below will list the

dexplorer -w.

This command will show the type type of DPU implemented in the PL, we can see this below in the ZCU104 and Ultra96 Implementations

B4096F DPU in the ZCU 104

B1152F DPU in the Ultra96

We can also use the status command, which will list the status of the DPUs

dexplorer-s

DPU Status

Another interesting command is the profiling command, this will create profiling information.

To start the profiling we run the command

dexplorer -m profile

Once the application has completed we can covert the profile information into HTML using the command below. Each trace will have a different PID depending upon the process ID.

dsight -p dpu_trace_[PID].prof

When re run an application example we will also capture the profile information.

Running the ADAS Example

Lets take an example and run the Ultra96 ADAS example. to run this we need to use the following commands

cd Ultra96/samples/adas_detection 
make 
dexplorer -m profile
./adas_detection video/adas.avi

This will run the application, offloading element to the DPU in the programmable logic.

When we run this application, a video will appear on the desktop and you will see the video running with cars detected and identified and tracked

ADAS Detection working

Once the application has completed we can view the profile application by converting it to HTML using the command

dsight -p dpu_trace_[PID].prof

Running this on the profile information captured when running the ADAS implementation results in the graph below.

ADAS Detection profile information

Modifying the Pose Example to Use a Live Feed

With the applications and tools all running on our boards, we can start to develop our own applications. For this we have two choices one train new network weights using the TensorFlow / Caffe, DECENT and DNNC or we can adapt one of the examples.

To wrap this project up this is exactly what we are going to do

We are going to adapt the pose_detection algorithm to run with a live stream and not a a video. This is a pretty simple modification to make and we can do so with the changes below

Update the declaration of the video capture as below

VideoCapture cap(0);

Modify the main function as below

int main(int argc, char **argv) {
   // Attach to DPU driver and prepare for running
   dpuOpen();
   if (!cap.isOpened()) {
      return -1;
   }
	cap.set(CV_CAP_PROP_FRAME_WIDTH,640);
	cap.set(CV_CAP_PROP_FRAME_HEIGHT,320);
   // Run tasks for SSD
   array<thread, 4> threads = {thread(Read, ref(is_reading)),
                               thread(runGestureDetect, ref(is_running_1)),
                               thread(runGestureDetect, ref(is_running_1)),
                               thread(Display, ref(is_displaying))};
   for (int i = 0; i < 4; ++i) {
       threads[i].join();
   }
   // Detach from DPU driver and release resources
   dpuClose();
   cap.release();
   return 0;
}

Modify the Read function as below

void Read(bool &is_reading) {
   while (is_reading) {
       Mat img;
       if (read_queue.size() < 30) {
           if (!cap.read(img)) {
               cout << "Finish reading the video." << endl;
               is_reading = false;
               break;
           }
           mtx_read_queue.lock();
           read_queue.push(make_pair(read_index++, img));
           mtx_read_queue.unlock();
       } else {
           usleep(20);
       }
   }
}

We can then compile the application again using the make command, in the pose_detection directory

make 
dexplorer -m profile
./pose_detection

When I ran this on my Ultra96 and had my wife do a little modelling for the testing, I captured the video below which shows the algorithm working pretty well I think.

The final thing to do was to observe the profile information

Profile information for the live feed Pose Detection

Conclusion

This project has hopefully provided a good introduction to the DNNDK and how we can easily get it up and running. We can adapt the existing applications if we so desire.

If we want to train a new network for deployment, we will examine that in another project soon.