This tutorial build real-time face mask detection application running on Ultra96-V2 board using Vitis-AI. Achieved fps is 22-24.
This tutorial consists of the following steps.
- 1. train YOLOv3-tiny model on darknet
- 2. setup Vitis-AI development environment and Ultra96-V2 hardware image
- 3. quantize and compile YOLOv3-tiny model
- 4. create YOLOv3-tiny face mask detect application
- 5. deploy YOLOv3-tiny application to Ultra96-V2
Environment : Ubuntu18.04 and Vitis-AI v1.1
1. train YOLOv3-tiny model on darknetThis step is done in the host environment, not in the docker environment.
- download darknet and face mask dataset
git clone https://github.com/pjreddie/darknet
git clone https://github.com/VictorLin000/YOLOv3_mask_detect
cp YOLOv3_mask_detect/Mask darknet/
- create custom model cfg file
cd darknet
cp cfg/yolov3-tiny.cfg cfg/yolov3-tiny_mask.cfg
- Modify input image resolution: image resolution is changed from 416x416 to 224x224 for real-time inference.
- Modify the number of class : In this tutorial, there are three identification classes: good/bad/none. modify the parameter of yolo layer and its previous convolution layer. The filter value of convolution layer is calculated based on 3*(n_classes + 5) formula.
- Modify the maxpool size : To avoid the error on model conversion from darknet to caffe in the step 3, maxpool layer size in the line 94 is modified to 1.
edit cfg/yolov3-tiny_mask.cfg as follows.
diff cfg/yolov3-tiny.cfg cfg/yolov3-tiny_mask.cfg
8,9c8,9
< width=416
< height=416
---
> width=224
> height=224
94c94
< size=2
---
> size=1
127c127
< filters=255
---
> filters=24
135c135
< classes=80
---
> classes=3
171c171
< filters=255
---
> filters=24
177c177
< classes=80
---
> classes=3
- Train yolov3-tiny model on the custom dataset
1. Build darknet. For accelerate training with GPU, you had better enable GPU and CUDNN option in Makefile.
GPU=1
CUDNN=1
OPENCV=1
OPENMP=1
DEBUG=0
Set the proper arch value according to your GPU. In my case I use RTX2070(arch=75), so add the following line in Makefile.
ARCH= -gencoded arch=compute_75,code=[sm_75,compute_75]
After edit Makefile, build darknet.
make -j 8
2. After you build darknet, train darknet yolov3 on the custom dataset. For training, edit the beginning section of cfg/yolov3-tiny_mask.cfg: comment out Line3-4, and uncomment Line 6-7.
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=16
subdivisions=2
NOTE: When you use GPU for training, decrease batch value and increase subdivision value based on your GPU memory size. If GPU memory is not enough, 0 CUDA Error: out of memory error occurred.
3. Edit path in Mask/obj.data:
classes = 3
train = <your-working-directory-path>/darknet/Mask/train.txt
valid = <your-working-directory-path>/darknet/Mask/test.txt
names = <your-working-directory-path>/darknet/Mask/obj.names
backup = <your-working-directory-path>/darknet/backup/
4. download pretrained weight and extract weights for training on the custom dataset. The extracted weight file yolov3-tiny.conv.15 is used as the initial weight for training.
wget https://pjreddie.com/media/files/yolov3-tiny.weights
./darknet partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15
Run training by the following command:
./darknet detector train "./Mask/obj.data" "./cfg/yolov3-tiny_mask.cfg" "./yolov3-tiny.conv.15"
The smaller avg value displayed on the console means the higher accuracy. This time, I finished training with 60, 000 iterations.
- Test your trained model on host machine
After training, test your trained model. modify the cfg/yolov3-tiny_mask.cfg file for test.
[net]
# Testing
batch=1
subdivisions=1
# Training
#batch=16
#subdivisions=2
Run test for image:
./darknet detector test ./Mask/obj.data ./cfg/yolov3-tiny_mask.cfg backup/yolov3-tiny_mask_60000.weights ./Mask/demo/Mask_121.jpg
You will get the predicted result like this:
The construction of the Vitis-AI environment on the host machine and the construction of the DNN model execution environment on Ultra96 are described in detail on this page. You can also download the SD card image of the pre-built Ultra96 runtime environment from the link.
https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-1-007b0e
Create SD card image and save "custom.json", dpu configuration file.
3. quantize and compile YOLOv3-tiny modelThis step is performed on the docker environment.
This step references the following tutorial. NOTE: The commands are slightly different from this tutorial because Vitis-AI environment is not used in the referenced tutorial.
https://www.hackster.io/LogicTronix/yolov3-tiny-tutorial-darknet-to-caffe-for-xilinx-dnndk-4529df
Launch the docker environment from the host machine.
./docker_run.sh xilinx/vitis-ai-gpu:latest
Activate Vitis-AI caffe environment on docker.
conda activate vitis-ai-caffe
- Convert darknet model to caffe model
Download the conversion script "convert.py" from darknet to caffe.
https://github.com/Xilinx/Vitis-AI/tree/v1.2.1/alveo/apps/yolo/darknet_to_caffe
Modify line 453 and line 455 to avoid the problem that the padding size of maxpool is not an integer.
453c453
< pooling_param['pad'] = str((int(block['size'])-1)/2)
---
> pooling_param['pad'] = str((int(block['size'])-1)//2)
455c455
< pooling_param['pad'] = str((int(block['size'])-1)/2)
---
> pooling_param['pad'] = str((int(block['size'])-1)//2)
Copy the model cfg file and trained weight to the working directory.
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# mkdir caffe_converted
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# python convert.py yolov3-tiny_mask.cfg yolov3-tiny_mask_60000.weights caffe_converted/yolov3.prototxt caffe_converted/yolov3.caffemodel
- Quantize caffe model
First, create directory for quantization and copy converted caffe model.
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# mkdir quantization
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# cp caffe_converted/yolov3.prototxt ./quantization/
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# cp caffe_converted/yolov3.caffemodel ./quantization/
Edit./quantization/yolov3.prototxt for quantization. uncomment line 2-7. and the following lines.
name: "Darkent2Caffe"
# layer {
# name: "data"
# type: "Input"
# top: "data"
# input_param: { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }
# }
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
yolo_height:224 #change height according to Darknet model
yolo_width:224 #change width according to Darknet model
}
image_data_param {
source:"/workspace/darknet/Mask/calib_data.txt" #change path accordingly
root_folder:"/workspace/darknet/Mask/yolo/" #change path accordingly
batch_size: 1
shuffle: false
}
}
Prepare the calib_data.txt for the calibration process on quantization. This text file is like
Mask_193.jpg 0
Mask_360.jpg 0
Mask_342.jpg 0
...
"0" means that the image is used in the calibration process. root_folder is specified as the root path of training image.
Run quantization by the following command.
vai_q_caffe quantize \
-model ./yolov3.prototxt \
-weights ./yolov3.caffemodel \
-gpu 0 \
-calib_iter 1000
After the quantization process completed, you got the following line on console.
--------------------------------------------------
Output Quantized Train&Test Model: "./quantize_results/quantize_train_test.prototxt"
Output Quantized Train&Test Weights: "./quantize_results/quantize_train_test.caffemodel"
Output Deploy Weights: "./quantize_results/deploy.caffemodel"
Output Deploy Model: "./quantize_results/deploy.prototxt"
- Compile quantized model for DPU
First, create directory for compile and copy quantized caffe model.
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/xilinx_competition# mkdir compile
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/xilinx_competition# cp quantization/quantize_results/deploy.* ./compile
Modify deploy.prototxt for compile. Uncomment from line 5-9.
layer {
name: "data"
type: "Input"
top: "data"
# transform_param {
# mirror: false
# yolo_height: 224
# yolo_width: 224
# }
Run compile by the following command. In the argument arch, specify the json file that represents the DPU configuration created in Step 2.
vai_c_caffe --prototxt=./deploy.prototxt \
--caffemodel=./deploy.caffemodel \
--output_dir=compiled \
--net_name=yolov3tiny \
--arch=/workspace/ultra96v2/ultra96v2_vitis_flow_tutorial_1/custom.json
You got the following line after compilation.
Kernel topology "yolov3tiny_kernel_graph.jpg" for network "yolov3tiny"
kernel list info for network "yolov3tiny"
Kernel ID : Name
0 : yolov3tiny
Kernel Name : yolov3tiny
--------------------------------------------------------------------------------
Kernel Type : DPUKernel
Code Size : 0.08MB
Param Size : 8.27MB
Workload MACs : 1578.74MOPS
IO Memory Space : 0.34MB
Mean Value : 0, 0, 0,
Total Tensor Count : 19
Boundary Input Tensor(s) (H*W*C)
data:0(0) : 224*224*3
Boundary Output Tensor(s) (H*W*C)
layer15_conv:0(0) : 7*7*24
layer22_conv:0(1) : 14*14*24
Total Node Count : 17
Input Node(s) (H*W*C)
layer0_conv(0) : 224*224*3
Output Node(s) (H*W*C)
layer15_conv(0) : 7*7*24
layer22_conv(0) : 14*14*24
**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
"./compiled/dpu_yolov3tiny.elf" file is generated by this process, and is used for building application
4. create YOLOv3-tiny face mask detect applicationCreate YOLOv3-tiny face mask detect application based on the following sample.
https://github.com/Xilinx/Vitis-AI/tree/v1.1/mpsoc/vitis_ai_dnndk_samples/adas_detection
The sample application is based on YOLOv3, not YOLOv3-tiny.
My code is uploaded on my github repository. Based on the sample application, I modified some codes for tinyYOLOv3 and add the inference mode for the image captured from the webcam.
https://github.com/lp6m/tiny_yolov3_face_mask_detect
Copy compiled model dpu_yolov3tiny.elf in step 3 to model/ directory.
├── Makefile
├── model
│ └── dpu_yolov3tiny.elf
├── src
│ ├── main.cc
│ └── utils.h
├── Mask_121.jpg
└── youtube_320.mp4
Here, I explain some parts of the modified code.
- modify the output node in main.cc
const string outputs_node[2] = {"layer15_conv", "layer22_conv"};
- modify the anchor in utils.h: this anchor value corresponds to anchor value in "yolov3-tiny_mask.cfg"
vector<float> biases{81,82, 135, 169, 344,319, 10,14, 23,27, 37,58};
And also add the real-time object detection mode on the image captured from USB camera.
- modify Makefile for cross compile
CXX ?= g++
CC ?= gcc
- Build the application on host machine with cross-compile environment.
You can use cross-compile after setting up sysroot environment on host machine.
wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
chmod +x sdk.sh
./sdk.sh -d ~/work/petalinux_sdk_vai_1_1_dnndk
unset LD_LIBRARY_PATH
source ~/work/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux
wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai_v1.1_dnndk.tar.gz
cd vitis-ai_v1.1_dnndk
./install.sh $SDKTARGETSYSROOT
cd yolov3_face_mask_detection
make
5. deploy YOLOv3-tiny application to Ultra96-V2- Create SDcard
Divide the SD card into two partitions as described in the tutorial referenced in Step 2. First partition(BOOT): vfat, second partition(rootfs): ext4
#First partition
sudo cp prj/Vitis/binary_container_1/sd_card/BOOT.BIN /medial/<username>/BOOT/
sudo cp prj/Vitis/binary_container_1/sd_card/image.ub /medial/<username>/BOOT/
#Second partition
sudo tar xvf rootfs.tar.gz -C /media/<username>/rootfs/
#copy DNNDDK(Vitis-AI) runtime to sdcard
wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai_v1.1_dnndk.tar.gz
sudo cp -r ./vitis-ai_v1.1_dnndk /media/<username>/rootfs/home/root
#copy dpu bitstream to sdcard
sudo cp prj/Vitis/binary_container_1/sd_card/dpu.xclbin /media/<username>/rootfs/usr/lib
#copy application to sdcard
sudo cp -r ./yolov3_face_mask_detection /media/<username>/rootfs/home/root/
#wait for copy completion (you can safely remove SD card)
sync
- Install Vitis-AI DNNDK runtime on Ultra96
Insert SD card to Ultra96v2, and connect power supply, DP-HDMI display adapter and USB keyboard/mouse/camera. And turn on the device.
Install the Vitis-AI DNNDK environment on your device on the console. This process is required only on the first boot.
cd vitis-ai_v1.1_dnndk
./install.sh
- Run application
Before you run the application, you have to change the display resolution.
export DISPLAY=:0.0
xrandr --output DP-1 --mode 640x480
Run the face mask detection. There are three modes: image/viedeo/usb camera.
# image mode
./yolo Mask/Mask_121.jpg i
# video mode
./yolo youtube_320.mp4 v
# webcam realtime mode
./yolo camera c
This video is the real-time face mask detection on image captured from usb camera. FPS is displayed in the upper left of the screen. 22-24fps is achieved.
Comments