Introduction
Step 0 – Overview of the Python scripts
Step 1 – Create the SD card image for the Vitis-AI 1.1 enabled platform
Step 2 – Install the tutorial files and required packages
Step 3 – Execute the face detection and tracking Python scripts
Summary

Published July 13, 2020 © Apache-2.0

Face Detection and Tracking in Python on Ultra96-V2

This guide provides detailed instructions on implementing face detection and face tracking in Python on the Ultra96-V2 platform.

IntermediateFull instructions provided3 hours12,732

Things used in this project

Hardware components

Avnet Ultra96-V2

Webcam, Logitech® HD Pro

DisplayPort monitor

Software apps and online services

Vitis-AI 1.1 Flow for Avnet Vitis Platforms

OpenCV

Story

Introduction

This guide provides detailed instructions on implementing face detection and face tracking in Python on the Ultra96-V2 platform.

This tutorial builds on top of the following “Vitis-AI 1.1 flow for Avnet Vitis Platforms” two part tutorial:

https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-1-007b0e

https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-2-f18be4

Although this tutorial specifically targets the Ultra96-V2 platform, it can target any of the following platforms:

Ultra96-V2 Development Board
UltraZed-EV SOM (7EV) + FMC Carrier Card
UltraZed-EG SOM (3EG) + IO Carrier Card
UltraZed-EG SOM (3EG) + PCIEC Carrier Card

In this tutorial, we will build the following AI pipeline, implemented in Python, which can serve as a basis for future algorithm exploration.

There are many algorithms that can be used for face detection:

Haar Cascade
Histogram of Gradients (HOG) + State Vector Machines (SVM)
Deep Neural Networks (DNN) : Single-Shot Detectors (SSD), DenseBox, etc…

There are also several algorithms that can be used for object tracking:

Optical flow
Kalman filtering
Meanshift / Camshift

One possible strategy when combining a “detection” algorithm with a “tracking” algorithm is to take advantage of the fact that “tracking” algorithms are generally faster than “detection” algorithms. One possible implementation could start with one “detection” iteration, followed by several “tracking” iterations, in order to optimize limited compute resources.

In this tutorial, I will take a different strategy. Since we already have an optimized face detection algorithm (DenseBox) that runs real-time, I will perform the face “detection” on each frame, and use a much simpler tracking algorithm, also executed on each frame.

The object tracking I decided to use is a simple centroid tracker, implemented by Adrian Rosebrock at PyImageSearch:

Adrian Rosebrock, Simple Object Tracking with OpenCV, PyImageSearch,https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ accessed on 15 June, 2020

As mentioned previously, this tutorial will reuse the existing pre-optimized densebox model, for face detection. We already saw this example in the “Vitis-AI 1.1. flow for Avnet Vitis platforms” tutorial. This time, we will be calling it from a Python script instead of a C++ application. The motivation for using Python is simply that the Python language is largely used in the industry for quick algorithm exploration. There is an incredible wealth of Python packages and examples that can be used to quickly prototype an idea.

This tutorial will go through the following steps:

Step 0 – Overview of the Python scripts
Step 1 – Create the SD card image for the Vitis-AI 1.1 enabled platform
Step 2 – Install the tutorial files and required packages
Step 3 – Execute the face detection and tracking Python scripts

Step 0 – Overview of the Python scripts

Python implementation of Face Detection

Vitis-AI 1.1, provided by Xilinx, provides a development flow for AI inference on Xilinx devices. This flow includes an AI engine, called the DPU (Deep-Learning Processing Unit), along with an API for Linux applications, called VART.

This VART API is available for C++ applications, as well as Python scripts.

Most of the provided examples are in C++, while two of the classification examples are provided in Python:

inception-v1
resnet50

There is, however, no Python example provided for the following face detection model:

densebox

Since the best way to understand an API is to write code that makes use of it, I embarked on the task of writing a Python version of the face detection example, making use of the densebox model, from the Model Zoo.

As it turns out, I found a verification script in the model zoo, for the cf_densebox_wider_360_640_1.11G model,

models/cf_densebox_wider_360_640_1.11G/code/test/visualTest/detect.py

This script was used as a reference for the implementation of my code.

To get started, the general format of a Python example, making use of the VART API, is the following:

dpu = runner.Runner("vitis_rundir")[0]
""" Prepare input/output buffers """
...
""" Execute model on DPU """
job_id = dpu.execute_async( inputData, outputData )
dpu.wait(job_id)
""" Retrieve output results """
...

The first line initialized the VART API, and specifies the directory where the meta-data for the model can be found. In our Vitis-AI 1.1 platform, we are interested in the 640x360 version of the densebox model, which is located in the “/usr/share/vitis-ai_library/models/densebox_640_360” directory, and has the following content:

/usr/share/vitis_ai_library/models/densebox_640_360/
│
│ densebox_640_360.elf
│ densebox_640_360.prototxt
│ meta.json

The “meta.json” file contains the meta-data for the model:

{
   "target": "DPUv2",
   "lib": "libvart-dpu-runner.so",
   "filename": "densebox_640_360.elf",
   "kernel": [ "densebox_640_360" ],
   "config_file": "densebox_640_360.prototxt"
}

This file indicates that we are targeting the ”DPUv2” hardware core, using the “libvart-dpu-runner.so” API. This model has a single kernel, which is an uninterrupted sequence of layers making up the CNN model. The kernel name is “densebox_640_360”, and the executable code/data for this kernel is contained in the “densebox_640_360.elf” binary.

The prototxt file indicates important pre-processing information (mean values, scaling factor), which we will need for our Python implementation.

model {
   name : "dense_box_640x360"
   kernel {
      name: "tiling_v7_640"
      mean: 128.0
      mean: 128.0
      mean: 128.0
      scale: 1.0
      scale: 1.0
      scale: 1.0
   }
   model_type : DENSE_BOX
   dense_box_param {
      num_of_classes : 2
      nms_threshold: 0.3
      det_threshold: 0.9
   }
}

The first step is to pre-process the input image, and prepare the input/output buffers:

""" Image pre-processing """
# normalize
img = img - 128.0
# resize
img = cv2.resize(img,(inputWidth,inputHeight))

""" Prepare input/output buffers """
inputData = []
inputData.append(np.empty((inputShape), dtype=np.float32,order='C'))
inputImage = inputData[0]
inputImage[0,...] = img

outputData = []
outputData.append(np.empty((output0Shape), dtype=np.float32,order='C'))
outputData.append(np.empty((output1Shape), dtype=np.float32,order='C'))

The pre-processing involves subtracting the mean (128), scaling (in this case 1.0, so not performed), and resizing the incoming image to the model’s input size of inputWidth x inputHeight (640 x 360).

Next, an input buffer needs to be prepared that contains the input image, and two output buffers need to be allocated for the following two outputs:

bounding boxes : 160 x 90 x {xmin, ymin, xmax, ymax}
scores : 160 x 90 x {score0, score1}

Notice that the model output consists of 2D grids of results which are 4x smaller (in both width and height) than the input image.

The model is executed on the DPU, then the two output results are retrieved from memory.

""" Execute model on DPU """
job_id = dpu.execute_async( inputData, outputData )
dpu.wait(job_id)

""" Retrieve output results """
OutputData0 = outputData[0].reshape(1,output0Size)
bboxes = np.reshape( OutputData0, (-1, 4) )
#
outputData1 = outputData[1].reshape(1,output1Size)
scores = np.reshape( outputData1, (-1, 2))

The bounding boxes coordinates are relative to each grid position, so need to be post-processed to add the absolute coordinates of each grid position to bounding box results. The following Python code implements this in a vectorized style in order to keep performance optimal:

""" Get original face boxes """
gy = np.arange(0,output0Height)
gx = np.arange(0,output0Width)
[x,y] = np.meshgrid(gx,gy)
x = x.ravel()*4
y = y.ravel()*4
bboxes[:,0] = bboxes[:,0] + x
bboxes[:,1] = bboxes[:,1] + y
bboxes[:,2] = bboxes[:,2] + x
bboxes[:,3] = bboxes[:,3] + y

The two score results for each grid location correspond to the scores associated with a bounding box being not present, and being present. These two results need to be normalized into a probability distribution which sums to 1.0. We use the softmax function to perform this step. More specifically, we use a special version of softwax, softmax_2, that performs several iterations (160x90) of the 2 class normalization. Once normalized, we only keep the results for which the probability of a bounding box being present are above a certain threshold.

""" Run softmax """
softmax = softmax_2( scores )

""" Only keep faces for which prob is above detection threshold """
prob = softmax[:,1]
keep_idx = prob.ravel() > self.detThreshold
bboxes = bboxes[ keep_idx, : ]
bboxes = np.array( bboxes, dtype=np.float32 )
prob = prob[ keep_idx ]

At this point, there are many bounding boxes that remain, most of which are duplicates (overlapping entities) of each other. In order to remove these duplicates, the non-maximal suppression algorithm is used, which measures the overlap (IOU) of each bounding box with respect to each other.

""" Perform Non-Maxima Suppression """
face_indices = []
if ( len(bboxes) > 0 ):
   face_indices = nms_boxes( bboxes, prob, self.nmsThreshold );
faces = bboxes[face_indices]

The final step is to scale the coordinates of the detected faces back to the original input image size. For this step, I did not code in a vectorized style, but this should be negligible, since we should typically have less than a dozen faces.

# extract bounding box for each face
for i, face in enumerate(faces):
   xmin = max(face[0] * scale_w, 0 )
   ymin = max(face[1] * scale_h, 0 )
   xmax = min(face[2] * scale_w, imgWidth )
   ymax = min(face[3] * scale_h, imgHeight )
   faces[i] = ( int(xmin),int(ymin),int(xmax),int(ymax) )

It is left as an exercise to the reader to re-code this section in a vectorized form. Please share your implementation in the comments below.

All of the above code has been encapsulated in the following class:

vitis_ai_vart/facedetect.py

This allows the demo scripts to be simpler to code, and read. As an example, here is an excerpt of what the code looks like for the face detection example:

...
import runner
from vitis_ai_vart.facedetect import FaceDetect
...
# Initialize Vitis-AI/DPU based face detector
dpu = runner.Runner("/usr/share/vitis_ai_library/models/densebox_640_360")[0]
dpu_face_detector = FaceDetect(dpu,detThreshold,nmsThreshold)
dpu_face_detector.start()
...
while True:
   ...
   faces = dpu_face_detector.process(frame)
   ...
# Stop the face detector
dpu_face_detector.stop()
del dpu

Face Tracking

In order to implement the face tracking, the Face Detection is followed by a simple centroid based Object Tracking algorithm. For each detected face, the centroid of the bounding box is calculated, and tracked from frame to frame. For more details on this tracking implementation, refer to the original tutorial on PyImageSearch.com:

Adrian Rosebrock, Simple Object Tracking with OpenCV, PyImageSearch,https://www.pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/ accessed on 15 June, 2020

Single-Threaded vs Multi-Threaded

The main flow for the single-threaded demo scripts is illustrated in the following simplified timing diagram, where “Worker” refers to our “Face Detection” and “Face Tracking” application examples.

The maximum frame rate of the single-threaded implementation is limited by the total execution time of the entire application pipeline : Capture + Worker + Display. This is not ideal, since we know that the CPU may be idle while waiting for a new frame from the USB camera, and will certainly be waiting for the DPU to finish execution of each kernel.

In order to increase the frame rate, a multi-threaded implementation is also provided which breaks down the application into three main tasks:

CaptureTask
WorkerTask
DisplayTask

The tasks communicate with each other with synchronized queues.

The next simplified diagram illustrates how each of the Tasks can start execution in parallel.

One thread is provided for the CaptureTask and DisplayTask threads, while a user configurable number of threads are provided for the WorkerTask, allowing several requests for the DPU to be pipelined.

The following table provides an overview of the Python scripts provided with this tutorial.

Step 1 – Create the SD card image for the Vitis-AI 1.1 enabled platform

For detailed instructions on creating a Vitis-AI 1.1 enabled platform, refer to the “Vitis-AI 1.1 flow for Avnet Vitis platforms” 2 part tutorial:

https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-1-007b0e

https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-2-f18be4

For quick instructions on creating a Vitis-AI 1.1 enabled Ultra96-V2 platform, follow the following steps:

1) Download and extract the following pre-build SD card image

ULTRA96V2 : http://avnet.me/avnet-ultra96v2-vitis-ai-1.1-image (MD5SUM = 7f54ceed152a0c704f5da18c4738b3fc)

2) Program the “Avnet-ULTRA96V2-Vitis-AI-1-1-2020-05-15.img” image to a 16GB micro SD card using Balena Etcher

3) Download the following solution archive for the Vitis-AI 1.1 tutorial, and extract to the micro SD card’s BOOT partition:

COMMON : http://avnet.me/Avnet-COMMON-Vitis-AI-1-1-image (MD5SUM = 464ecc94368d1cb7deb184b653e740a1)

4) Boot the Ultra96-V2 board, with the SD card

5) Perform the following configuration for the WAYLAND desktop, which will allow to change the resolution of the monitor (for details, refer to the Vitis-AI 1.1 tutorial)

$ cd /mnt/runtime/WAYLAND
$ source ./install.sh
$ cat weston_append.ini >> /etc/xdg/weston/weston.ini
$ source ./change_resolution_1920x1080.sh

6) Perform the following configuration for the VART runtime

$ cd /mnt/runtime/VART
$ source ./install.sh
$ cd /mnt/runtime
$ dpkg -i vitis_ai_model_ULTRA96V2_2019.2-r1.1.1.deb

To verify the Vitis-AI 1.1 enabled platform, perform the following steps:

7) Change to a lower resolution, such as 1280x720

$ source /mnt/runtime/WAYLAND/change_resolution_1280x720.sh

8) Define the DISPLAY environment variable

$ export DISPLAY=:0.0

9) Run the C++ version of the face detection sample

$ cd /mnt/Vitis-AI-Library/overview/samples/facedetect
$ ./test_video_facedetect densebox_640_360 0

Step 2 – Install the tutorial files and required packages

The files for this tutorial can be found on the Avnet github repositories:

https://github.com/Avnet/face_py_vart

With an ethernet connection enabled on your Ultra96-V2 embedded platform, use “pip3 install …” to install the Python package:

$ cd /mnt
$ git clone https://github.com/Avnet/face_py_vart

If you do not have an ethernet connection available on your embedded platform, download the tutorial files from the repository:

https://github.com/Avnet/face_py_vart

Then, copy it to the BOOT partition of your SD card, under a directory called “face_py_vart”.

This tutorial has the following project structure:

/mnt/face_py_vart/
│
│ avnet_face_detection.py
│ avnet_face_detection_mt.py
│ avnet_face_tracking.py
│ avnet_face_tracking_mt.py
│ avnet_passthrough.py
│ avnet_passthrough_mt.py
│
├───pyimagesearch
│ │ centroidtracker.py
│ │ __init__.py
│ └───__pycache__
│
└───vitis_ai_vart
│ facedetect.py
│ __init__.py
└───__pycache__

The “avnet_*.py” scripts are the main demo scripts.

The “vitis_ai_vart” directory contains the Python implementation of the VART based face detection.

The “pyimagesearch” directory contains the reused centroid tracking code from PyImageSearch.com

The following Python package is required for this tutorial.

With an ethernet connection enabled on your Ultra96-V2 embedded platform, use “pip3 install …” to install the Python package:

$ pip3 install imutils

If you do not have an ethernet connection available on your embedded platform, download the “imutils.0.5.3.tar.gz” package from the following web site:

https://pypi.org/project/imutils/#files

Then, copy it to your SD card, and install it using the following command:

$ pip3 install imutils.0.5.3.tar.gz

Step 3 – Execute the face detection and tracking Python scripts

Python scripts for the following three demos have been provided, each with a single-threaded and multi-threaded implementation:

To use the Python scripts for this tutorial, navigate to the “face_py_vart” directory that you extracted on the BOOT partition of the SD card:

$ cd /mnt/face_py_vart

In order to get help on how to use each of the demo Python scripts, use the “-h” option, as described below:

$ python3 avnet_passthrough_mt.py -h
usage: avnet_passthrough_mt.py [-h] [-i INPUT] [-t THREADS]

optional arguments:
   -h, --help            show this help message and exit
   -i INPUT, --input INPUT   
                         input camera identifier (default = 0)
   -t THREADS, --threads THREADS 
                         number of worker threads (default = 4)
$ python3 avnet_face_detection_mt.py -h
usage: avnet_face_detection_mt.py [-h] [-i INPUT] [-d DETTHRESHOLD] [-n NMSTHRESHOLD] [-t THREADS]

optional arguments:
   -h, --help            show this help message and exit
   -i INPUT, --input INPUT
                         input camera identifier (default = 0)
   -d DETTHRESHOLD, --detthreshold DETTHRESHOLD
                         face detector softmax threshold (default = 0.55)
   -n NMSTHRESHOLD, --nmsthreshold NMSTHRESHOLD
                         face detector NMS threshold (default = 0.35)
   -t THREADS, --threads THREADS
                         number of worker threads (default = 4)
$ python3 avnet_face_tracking_mt.py -h
usage: avnet_face_tracking_mt.py [-h] [-i INPUT] [-d DETTHRESHOLD] [-n NMSTHRESHOLD] [-t THREADS]

optional arguments:
   -h, --help            show this help message and exit
   -i INPUT, --input INPUT
                         input camera identifier (default = 0)
   -d DETTHRESHOLD, --detthreshold DETTHRESHOLD
                         face detector softmax threshold (default = 0.55)
   -n NMSTHRESHOLD, --nmsthreshold NMSTHRESHOLD
                         face detector NMS threshold (default = 0.35)
   -t THREADS, --threads THREADS
                         number of worker threads (default = 4)

Face Detection

To launch the multi-threaded face detection Python script:

$ python3 avnet_face_detection_mt.py -i 0 -d 0.55 -n 0.35 -t 4

Face Tracking

To launch the multi-threaded face tracking Python script:

$ python3 avnet_face_tracking_mt.py -i 0 -d 0.55 -n 0.35 -t 4

Experimenting with Parameters

If you are getting duplicate detected ROIs for a unique face, you can try to increase the detection threshold, detThreshold, to a higher value using the “-d 0.90” command line argument.

The centroid based object tracker allows for a detected face to disappear for a certain amount of frames, which can be changed in the pyimageserach/centroidtracker.py script. This allows a face that is momentarily “lost” to keep it’s associated “id” when it re-appears. The number of frames that a face is tolerated as been temporarity lost, maxDisappeared, is defined in the “pyimagesearch/centroidtracker.py” script:

class CentroidTracker():
   def __init__(self, maxDisappeared=20):

Summary

This tutorial described how to access the pre-trained densebox model from the Xilinx Model Zoo for face detection in python.

This python based example was augmented with a simple object tracking algorithm.

Finally, a multi-threaded implementation was used to make better use of the CPU and the DPU (hardware AI engine), in order to achieve higher throughput.

I hope this tutorial will serve as a foundation for additional exploration !

Credits

Mario Bergeron

50 projects • 277 followers

Mario Bergeron is a Technical Marketing Engineer working at Tria, specializing in embedded vision and machine learning.

Face Detection and Tracking in Python on Ultra96-V2

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Step 0 – Overview of the Python scripts

Step 1 – Create the SD card image for the Vitis-AI 1.1 enabled platform

Step 2 – Install the tutorial files and required packages

Step 3 – Execute the face detection and tracking Python scripts

Summary

Code

face_py_vart

Credits

Mario Bergeron

Comments

Embed the widget on your own site

Face Detection and Tracking in Python on Ultra96-V2

Face Detection and Tracking in Python on Ultra96-V2

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Step 0 – Overview of the Python scripts

Step 1 – Create the SD card image for the Vitis-AI 1.1 enabled platform

Step 2 – Install the tutorial files and required packages

Step 3 – Execute the face detection and tracking Python scripts

Summary

Code

face_py_vart

Credits

Mario Bergeron

Comments

Related channels and tags