Introduction
Step 1 - Create the SD card
Step 2 - Clone the source code repository
Step 3 - Overview of the facedetect example
Step 4 - Creating the head pose estimation application
Step 5 - Python Implementation
Known Limitations
Step 6 - Improving results with DLIB
Step 7 - Going further with DLIB face landmarks
Conclusion
Revision History

Published March 15, 2021 © Apache-2.0

Head-Pose Estimation on Ultra96-V2

The project describes how to implement a real-time head pose estimation on Ultra96-V2 using Vitis-AI.

IntermediateFull instructions provided3 hours4,043

Things used in this project

Hardware components

Tria Technologies Ultra96-V2

Webcam, Logitech® HD Pro

DisplayPort monitor

Software apps and online services

Tria Technologies Vitis-AI 1.3 Flow for Avnet Vitis platforms

OpenCV

Story

Introduction

The Xilinx Model Zoo contains many pre-built convolutional neural network models.

This project makes use of several of these models, in order to implement a foundation for creating face applications.

face detection : densebox_640_360
face landmark detection : facelandmark

head pose estimation - 2-inference pipeline + additional processing

With faces detected, and facial landmarks identified, we can add additional processing, such as head pose estimation.

Satya Mallick, Head Pose Estimation with OpenCV and DLIB, LearnOpenCV https://learnopencv.com/head-pose-estimation-using-opencv-and-dlib/ https://github.com/spmallick/learnopencv/blob/master/HeadPose/headPose.cpp

Head Pose Estimation with OpenCV and DLIB, by Satya Mallick (LearnOpenCV)

Let's get started !

Step 1 - Create the SD card

Pre-built Vitis-AI 1.3 SD card images have been provided for the following Avnet platforms:

u96v2_sbc_base : Ultra96-V2 Development Board
uz7ev_evcc_base : UltraZed-EV SOM (7EV) + FMC Carrier Card
uz3eg_iocc_base : UltraZed-EG SOM (3EG) + IO Carrier Card

The download links for the pre-built SD card images can be found here:

Vitis-AI 1.3 Flow for Avnet Vitis Platforms : https://avnet.me/vitis-ai-1.3-project

Once downloaded, and extracted, the.img file can be programmed to a 16GB micro SD card.

0. Extract the archive to obtain the .img file

1. Program the board specific SD card image to a 16GB (or larger) micro SD card

a. On a Windows machine, use Balena Etcher or Win32DiskImager (free opensource software)

b. On a linux machine, use Balena Etcher or use the dd utility

$ sudo dd bs=4M if=Avnet-{platform}-Vitis-AI-1-3-{date}.img of=/dev/sd{X} status=progress conv=fsync

Where {X} is a smaller case letter that specifies the device of your SD card. You can use “df -h” to determine which device corresponds to your SD card.

Step 2 - Clone the source code repository

The source code used in this project can be obtained from the following repositories:

If you have an active internet connection, you can simply clone the repositories to the root directory of your embedded platform:

$ cd ~
$ git clone https://github.com/AlbertaBeef/vitis_ai_cpp_examples
$ git clone https://github.com/AlbertaBeef/vitis_ai_python_examples

Step 3 - Overview of the facedetect example

In order to implement the head pose estimation example, we modify an existing example, facedetect, that can be found in the following directory:

~/Vitis-AI/demo/Vitis-AI-Library/samples/facedetect

If we look at the test_video_facedetect.cpp source code, we can see that it is surprisingly small:

int main(int argc, char *argv[]) {
string model = argv[1];
return vitis::ai::main_for_video_demo(
    argc, argv,
    [model] {
        return vitis::ai::FaceDetect::create(model);
    },
    process_result, 2);
}

A visual representation of this code is shown in the following diagram:

We can see that the main function makes use of a generic main_for_video_demo() function, and passes it an instance of the FaceDetect class that provides create() and run() methods, as well as a process_result() function.

The example can be run using the following commands:

1. After boot, launch the dpu_sw_optimize.sh script, which will optimize the QoS configuration for DDR memory

$ cd ~/dpu_sw_optimize/zynqmp
$ source ./zynqmp_dpu_optimize.sh

2. Disable the dmesg verbose output:

$ dmesg -D

3. Define the DISPLAY environment variable

$ export DISPLAY=:0.0

4. Change the resolution of the DP monitor to a lower resolution, such as 640x480

$ xrandr --output DP-1 --mode 640x480

5. Launch the facedetect application with the following arguments:

specify “densebox_640_360” as first argument
specify “0” as second argument, to specify the USB camera)

$ cd ~/Vitis-AI/demo/Vitis-AI-Library/samples/facedetect
$ ./test_video_facedetect densebox_640_360 0

test_video_facedetect densebox_640_360 0

Step 4 - Creating the head pose estimation application

We can make use of this generic main_for_video_demo(), with a custom class that defines our modified use case(s), as shown in the following diagram:

For the head pose estimation example, the following modifications were made to the facedetect example:

Adding face landmarks
Adding the head pose estimation

The following diagram illustrates the modified code for this example.

The modified code can be found in the following location:

~/vitis_ai_cpp_examples/facedetectwithheadpose/test_video_facedetectwithheadpose.cpp

1. To build the head pose estimation application

$ cd ~/vitis_ai_cpp_examples/facedetectwithheadpose
$ ./build.sh

2. To launch the head pose estimation application

$ cd ~/vitis_ai_cpp_examples/facedetectwithheadpose
$ ./test_video_facedetectwithheadpose 0

test_video_facedetectwithheadpose 0

For the head pose estimation example, I have reused the following code:

Head Pose Estimation:

Satya Mallick, Head Pose Estimation with OpenCV and DLIB, LearnOpenCV https://learnopencv.com/head-pose-estimation-using-opencv-and-dlib/https://github.com/spmallick/learnopencv/blob/master/HeadPose/headPose.cpp

I will not describe the math behind this algorithm, as Mr. Mallick does an excellent job. All we need to know is that the following 6 landmark points are required on our 2D detected face, in order to estimate the head position.

The facelandmark model from Xilinx provides us with 5 landmarks, corresponding to the two eyes, the nose, and the corners of the mouth, so we are missing the 6th landmark, corresponding to the chin.

In my implementation, I have made a crude estimate of where the chin should be:

Approximately the same location with respect to the mouth, as the nose with respect to the eyes

Step 5 - Python Implementation

A similar implementation is also provided in python.

1. To launch the python version of the head pose estimation example:

$ cd ~/vitis_ai_python_examples/face_applications
$ python3 face_headpose.py

Known Limitations

The head pose estimation implemented in this project has certain limitations. It does not work well when the head pose is looking up or down. Two factors may contribute to this:

the landmarks used for the eyes correspond to the center of the eyes, whereas the head pose source code is assuming the outer corners of the eyes
the landmark used for the chin is estimated, and probably not always correct

Can you improve this implementation ?

would you calculate the chin location differently ?
would you use an alternate face landmark, that includes the face

Step 6 - Improving results with DLIB

In an attempt to improve the results, I experimented with the functionnality provided by DLIB, a very popular library for face detection and landmarks.

In order to speed things up, I will do this in python (instead of C++).

1. The first thing that needs to be done, is to install DLIB

The quickest way to install the dlib package (for use with python only), is with the pip3 command:

$ pip3 install dlib

The longer way to install dlib (for use by both python and C++), is to build from source:

# download source code from dlib.net
wget http://dlib.net/files/dlib-19.21.tar.bz2
tar xvf dlib-19.21.tar.bz2
cd dlib-19.21

# build/install for use with C++
mkdir build
cd build
cmake ..
cmake --build . --config Release
sudo make install

# build/install for use with python
python setup.py install

Both of these methods require a working internet connection and will take a long time, since the package will need to be built for our embedded platform.

2. Make sure you have the latest version of the repository contents

$ cd ~/vitis_ai_python_examples
$ git pull

3. Next, run the following script

$ cd ~/vitis_ai_python_examples/face_applications_dlib
$ python3 face_headpose_dlib.py

This version of the script has the following additional functionnality:

added the display for status, which includes FPS
press 'd' to toggle between face detection algorithms (VART versus DLIB)
press 'l' to toggle between face landmark algorithms (VART versus DLIB)

Note that VART is short for Vitis-AI runtime, and used here to denote the use of the Vitis-AI pre-built model.

The first observation to make is that the VART face detection runs 5x faster than the DLIB face detection, yet provide similar results.

1 / 2 • VART face detection - 10 frames/sec

The second observation that can be made is the that the VART face landmarks are not at the same place as the DLIB face landmarks, which probably explains why we are getting better head pose results with the DLIB face landmarks:

eye landmarks : located in center of eyes for VART, located at outer corner for DLIB
nose landmark : located at bottom of nose for VART, located at tip of nose for DLIB
chin landmark : estimated for VART, located correctly for DLIB

1 / 2 • VART face landmarks - front facing

The third observation that can be made is that the VART face landmark is better when the face is not front-facing.

1 / 2 • VART face landmarks - side facing

So, is there a winner ?

For performance, definitely the Vitis-AI based face detection and landmark implementations.

For head pose results:

For the front facing use case, Vitis-AI based face detection with DLIB based landmarks provides better results.

For the side facing use case, Vitis-AI based face detection and landmarks provides better results.

Step 7 - Going further with DLIB face landmarks

I encourage you to run the version of the script that displays all the landmarks for each of the algorithms, as follows:

$ cd ~/vitis_ai_python_examples/face_applications_dlib
$ python3 face_landmark_dlib.py

1 / 2 • DLIB landmarks - 68 points

What other applications can you think of that make use of face landmarks ? Share your thoughts in the comments below.

Conclusion

I hope this tutorial will help you to get started with face applications on Ultra96-V2, and other Avnet platforms.

If there is any other related content that you would like to see, please share your thoughts in the comments below.

Revision History

2021/03/15 - First Version

2021/03/18 - Added "Step 6 - Improving results with DLIB"

Credits

Mario Bergeron

54 projects • 295 followers

Mario Bergeron is a Technical Marketing Engineer working at Tria, specializing in embedded vision and machine learning.

Contact

Comments

Please log in or sign up to comment.

Head-Pose Estimation on Ultra96-V2

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Step 1 - Create the SD card

Step 2 - Clone the source code repository

Step 3 - Overview of the facedetect example

Step 4 - Creating the head pose estimation application

Step 5 - Python Implementation

Known Limitations

Step 6 - Improving results with DLIB

Step 7 - Going further with DLIB face landmarks

Conclusion

Revision History

Code

Vitis-AI C++ examples

Vitis-AI python examples

Credits

Mario Bergeron

Comments

Embed the widget on your own site

Head-Pose Estimation on Ultra96-V2

Head-Pose Estimation on Ultra96-V2

Things used in this project

Hardware components

Software apps and online services

Story

Introduction

Step 1 - Create the SD card

Step 2 - Clone the source code repository

Step 3 - Overview of the facedetect example

Step 4 - Creating the head pose estimation application

Step 5 - Python Implementation

Known Limitations

Step 6 - Improving results with DLIB

Step 7 - Going further with DLIB face landmarks

Conclusion

Revision History

Code

Vitis-AI C++ examples

Vitis-AI python examples

Credits

Mario Bergeron

Comments

Related channels and tags