The Xilinx Model Zoo contains many pre-built convolutional neural network models.
This project makes use of several of these models, in order to implement a foundation for creating face applications.
- face detection : densebox_640_360
- face landmark detection : facelandmark
With faces detected, and facial landmarks identified, we can add additional processing, such as head pose estimation.
- Satya Mallick, Head Pose Estimation with OpenCV and DLIB, LearnOpenCV https://learnopencv.com/head-pose-estimation-using-opencv-and-dlib/ https://github.com/spmallick/learnopencv/blob/master/HeadPose/headPose.cpp
Let's get started !
Step 1 - Create the SD cardPre-built Vitis-AI 1.3 SD card images have been provided for the following Avnet platforms:
- u96v2_sbc_base : Ultra96-V2 Development Board
- uz7ev_evcc_base : UltraZed-EV SOM (7EV) + FMC Carrier Card
- uz3eg_iocc_base : UltraZed-EG SOM (3EG) + IO Carrier Card
The download links for the pre-built SD card images can be found here:
- Vitis-AI 1.3 Flow for Avnet Vitis Platforms : https://avnet.me/vitis-ai-1.3-project
Once downloaded, and extracted, the.img file can be programmed to a 16GB micro SD card.
0. Extract the archive to obtain the .img file
1. Program the board specific SD card image to a 16GB (or larger) micro SD card
a. On a Windows machine, use Balena Etcher or Win32DiskImager (free opensource software)
b. On a linux machine, use Balena Etcher or use the dd utility
$ sudo dd bs=4M if=Avnet-{platform}-Vitis-AI-1-3-{date}.img of=/dev/sd{X} status=progress conv=fsync
Where {X} is a smaller case letter that specifies the device of your SD card. You can use “df -h” to determine which device corresponds to your SD card.
Step 2 - Clone the source code repositoryThe source code used in this project can be obtained from the following repositories:
- https://github.com/AlbertaBeef/vitis_ai_cpp_examples
- https://github.com/AlbertaBeef/vitis_ai_python_examples
If you have an active internet connection, you can simply clone the repositories to the root directory of your embedded platform:
$ cd ~
$ git clone https://github.com/AlbertaBeef/vitis_ai_cpp_examples
$ git clone https://github.com/AlbertaBeef/vitis_ai_python_examples
Step 3 - Overview of the facedetect exampleIn order to implement the head pose estimation example, we modify an existing example, facedetect, that can be found in the following directory:
~/Vitis-AI/demo/Vitis-AI-Library/samples/facedetect
If we look at the test_video_facedetect.cpp source code, we can see that it is surprisingly small:
int main(int argc, char *argv[]) {
string model = argv[1];
return vitis::ai::main_for_video_demo(
argc, argv,
[model] {
return vitis::ai::FaceDetect::create(model);
},
process_result, 2);
}
A visual representation of this code is shown in the following diagram:
We can see that the main function makes use of a generic main_for_video_demo() function, and passes it an instance of the FaceDetect class that provides create() and run() methods, as well as a process_result() function.
The example can be run using the following commands:
1. After boot, launch the dpu_sw_optimize.sh script, which will optimize the QoS configuration for DDR memory
$ cd ~/dpu_sw_optimize/zynqmp
$ source ./zynqmp_dpu_optimize.sh
2. Disable the dmesg verbose output:
$ dmesg -D
3. Define the DISPLAY environment variable
$ export DISPLAY=:0.0
4. Change the resolution of the DP monitor to a lower resolution, such as 640x480
$ xrandr --output DP-1 --mode 640x480
5. Launch the facedetect application with the following arguments:
- specify “densebox_640_360” as first argument
- specify “0” as second argument, to specify the USB camera)
$ cd ~/Vitis-AI/demo/Vitis-AI-Library/samples/facedetect
$ ./test_video_facedetect densebox_640_360 0
We can make use of this generic main_for_video_demo(), with a custom class that defines our modified use case(s), as shown in the following diagram:
For the head pose estimation example, the following modifications were made to the facedetect example:
- Adding face landmarks
- Adding the head pose estimation
The following diagram illustrates the modified code for this example.
The modified code can be found in the following location:
~/vitis_ai_cpp_examples/facedetectwithheadpose/test_video_facedetectwithheadpose.cpp
1. To build the head pose estimation application
$ cd ~/vitis_ai_cpp_examples/facedetectwithheadpose
$ ./build.sh
2. To launch the head pose estimation application
$ cd ~/vitis_ai_cpp_examples/facedetectwithheadpose
$ ./test_video_facedetectwithheadpose 0
For the head pose estimation example, I have reused the following code:
Head Pose Estimation:
- Satya Mallick, Head Pose Estimation with OpenCV and DLIB, LearnOpenCV https://learnopencv.com/head-pose-estimation-using-opencv-and-dlib/https://github.com/spmallick/learnopencv/blob/master/HeadPose/headPose.cpp
I will not describe the math behind this algorithm, as Mr. Mallick does an excellent job. All we need to know is that the following 6 landmark points are required on our 2D detected face, in order to estimate the head position.
The facelandmark model from Xilinx provides us with 5 landmarks, corresponding to the two eyes, the nose, and the corners of the mouth, so we are missing the 6th landmark, corresponding to the chin.
In my implementation, I have made a crude estimate of where the chin should be:
- Approximately the same location with respect to the mouth, as the nose with respect to the eyes
A similar implementation is also provided in python.
1. To launch the python version of the head pose estimation example:
$ cd ~/vitis_ai_python_examples/face_applications
$ python3 face_headpose.py
Known LimitationsThe head pose estimation implemented in this project has certain limitations. It does not work well when the head pose is looking up or down. Two factors may contribute to this:
- the landmarks used for the eyes correspond to the center of the eyes, whereas the head pose source code is assuming the outer corners of the eyes
- the landmark used for the chin is estimated, and probably not always correct
Can you improve this implementation ?
- would you calculate the chin location differently ?
- would you use an alternate face landmark, that includes the face
In an attempt to improve the results, I experimented with the functionnality provided by DLIB, a very popular library for face detection and landmarks.
In order to speed things up, I will do this in python (instead of C++).
1. The first thing that needs to be done, is to install DLIB
The quickest way to install the dlib package (for use with python only), is with the pip3 command:
$ pip3 install dlib
The longer way to install dlib (for use by both python and C++), is to build from source:
# download source code from dlib.net
wget http://dlib.net/files/dlib-19.21.tar.bz2
tar xvf dlib-19.21.tar.bz2
cd dlib-19.21
# build/install for use with C++
mkdir build
cd build
cmake ..
cmake --build . --config Release
sudo make install
# build/install for use with python
python setup.py install
Both of these methods require a working internet connection and will take a long time, since the package will need to be built for our embedded platform.
2. Make sure you have the latest version of the repository contents
$ cd ~/vitis_ai_python_examples
$ git pull
3. Next, run the following script
$ cd ~/vitis_ai_python_examples/face_applications_dlib
$ python3 face_headpose_dlib.py
This version of the script has the following additional functionnality:
- added the display for status, which includes FPS
- press 'd' to toggle between face detection algorithms (VART versus DLIB)
- press 'l' to toggle between face landmark algorithms (VART versus DLIB)
Note that VART is short for Vitis-AI runtime, and used here to denote the use of the Vitis-AI pre-built model.
The first observation to make is that the VART face detection runs 5x faster than the DLIB face detection, yet provide similar results.
The second observation that can be made is the that the VART face landmarks are not at the same place as the DLIB face landmarks, which probably explains why we are getting better head pose results with the DLIB face landmarks:
- eye landmarks : located in center of eyes for VART, located at outer corner for DLIB
- nose landmark : located at bottom of nose for VART, located at tip of nose for DLIB
- chin landmark : estimated for VART, located correctly for DLIB
The third observation that can be made is that the VART face landmark is better when the face is not front-facing.
So, is there a winner ?
For performance, definitely the Vitis-AI based face detection and landmark implementations.
For head pose results:
For the front facing use case, Vitis-AI based face detection with DLIB based landmarks provides better results.
For the side facing use case, Vitis-AI based face detection and landmarks provides better results.
Step 7 - Going further with DLIB face landmarksI encourage you to run the version of the script that displays all the landmarks for each of the algorithms, as follows:
$ cd ~/vitis_ai_python_examples/face_applications_dlib
$ python3 face_landmark_dlib.py
What other applications can you think of that make use of face landmarks ? Share your thoughts in the comments below.
ConclusionI hope this tutorial will help you to get started with face applications on Ultra96-V2, and other Avnet platforms.
If there is any other related content that you would like to see, please share your thoughts in the comments below.
Revision History2021/03/15 - First Version
2021/03/18 - Added "Step 6 - Improving results with DLIB"
Comments