Overview
1. train YOLOv3-tiny model on darknet
2. setup Vitis-AI development environment and Ultra96-V2 hardware image
3. quantize and compile YOLOv3-tiny model
4. create YOLOv3-tiny face mask detect application
5. deploy YOLOv3-tiny application to Ultra96-V2

Published December 6, 2020

Real-time tiny-YOLOv3 face mask detection on Ultra96v2

Build real-time face mask detection application running on Ultra96-V2 board using Vitis-AI.

IntermediateShowcase (no instructions)4,382

Real-time tiny-YOLOv3 face mask detection on Ultra96v2

Things used in this project

Hardware components

Tria Technologies Ultra96-V2

Camera (generic)

USB Web camera

Audio / Video Cable Assembly, DisplayPort Plug

use Active type, not Passive type.

Story

Overview

This tutorial build real-time face mask detection application running on Ultra96-V2 board using Vitis-AI. Achieved fps is 22-24.

demo video

This tutorial consists of the following steps.

1. train YOLOv3-tiny model on darknet
2. setup Vitis-AI development environment and Ultra96-V2 hardware image
3. quantize and compile YOLOv3-tiny model
4. create YOLOv3-tiny face mask detect application
5. deploy YOLOv3-tiny application to Ultra96-V2

Environment : Ubuntu18.04 and Vitis-AI v1.1

1. train YOLOv3-tiny model on darknet

This step is done in the host environment, not in the docker environment.

download darknet and face mask dataset

git clone https://github.com/pjreddie/darknet
git clone https://github.com/VictorLin000/YOLOv3_mask_detect
cp YOLOv3_mask_detect/Mask darknet/

create custom model cfg file

cd darknet
cp cfg/yolov3-tiny.cfg cfg/yolov3-tiny_mask.cfg

Modify input image resolution: image resolution is changed from 416x416 to 224x224 for real-time inference.
Modify the number of class : In this tutorial, there are three identification classes: good/bad/none. modify the parameter of yolo layer and its previous convolution layer. The filter value of convolution layer is calculated based on 3*(n_classes + 5) formula.
Modify the maxpool size : To avoid the error on model conversion from darknet to caffe in the step 3, maxpool layer size in the line 94 is modified to 1.

cf: https://forums.xilinx.com/t5/AI-and-Vitis-AI/How-to-convert-YOLOv3-tiny-darknet-to-caffemodel/td-p/980132

edit cfg/yolov3-tiny_mask.cfg as follows.

diff cfg/yolov3-tiny.cfg cfg/yolov3-tiny_mask.cfg
8,9c8,9
< width=416
< height=416
---
> width=224
> height=224
94c94
< size=2
---
> size=1
127c127
< filters=255
---
> filters=24
135c135
< classes=80
---
> classes=3
171c171
< filters=255
---
> filters=24
177c177
< classes=80
---
> classes=3

Train yolov3-tiny model on the custom dataset

1. Build darknet. For accelerate training with GPU, you had better enable GPU and CUDNN option in Makefile.

GPU=1
CUDNN=1
OPENCV=1
OPENMP=1
DEBUG=0

Set the proper arch value according to your GPU. In my case I use RTX2070(arch=75), so add the following line in Makefile.

ARCH= -gencoded arch=compute_75,code=[sm_75,compute_75]

After edit Makefile, build darknet.

make -j 8

2. After you build darknet, train darknet yolov3 on the custom dataset. For training, edit the beginning section of cfg/yolov3-tiny_mask.cfg: comment out Line3-4, and uncomment Line 6-7.

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=16
subdivisions=2

NOTE: When you use GPU for training, decrease batch value and increase subdivision value based on your GPU memory size. If GPU memory is not enough, 0 CUDA Error: out of memory error occurred.

3. Edit path in Mask/obj.data:

classes = 3
train  = <your-working-directory-path>/darknet/Mask/train.txt
valid  = <your-working-directory-path>/darknet/Mask/test.txt
names = <your-working-directory-path>/darknet/Mask/obj.names
backup = <your-working-directory-path>/darknet/backup/

4. download pretrained weight and extract weights for training on the custom dataset. The extracted weight file yolov3-tiny.conv.15 is used as the initial weight for training.

wget https://pjreddie.com/media/files/yolov3-tiny.weights
./darknet partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15

Run training by the following command:

./darknet detector train "./Mask/obj.data" "./cfg/yolov3-tiny_mask.cfg" "./yolov3-tiny.conv.15"

The smaller avg value displayed on the console means the higher accuracy. This time, I finished training with 60, 000 iterations.

Test your trained model on host machine

After training, test your trained model. modify the cfg/yolov3-tiny_mask.cfg file for test.

[net]
# Testing
batch=1
subdivisions=1
# Training
#batch=16
#subdivisions=2

Run test for image:

./darknet detector test ./Mask/obj.data ./cfg/yolov3-tiny_mask.cfg  backup/yolov3-tiny_mask_60000.weights ./Mask/demo/Mask_121.jpg

You will get the predicted result like this:

predictions.png

2. setup Vitis-AI development environment and Ultra96-V2 hardware image.

The construction of the Vitis-AI environment on the host machine and the construction of the DNN model execution environment on Ultra96 are described in detail on this page. You can also download the SD card image of the pre-built Ultra96 runtime environment from the link.

https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-1-007b0e

Create SD card image and save "custom.json", dpu configuration file.

3. quantize and compile YOLOv3-tiny model

This step is performed on the docker environment.

This step references the following tutorial. NOTE: The commands are slightly different from this tutorial because Vitis-AI environment is not used in the referenced tutorial.

https://www.hackster.io/LogicTronix/yolov3-tiny-tutorial-darknet-to-caffe-for-xilinx-dnndk-4529df

Launch the docker environment from the host machine.

./docker_run.sh xilinx/vitis-ai-gpu:latest

Activate Vitis-AI caffe environment on docker.

conda activate vitis-ai-caffe

Convert darknet model to caffe model

Download the conversion script "convert.py" from darknet to caffe.

https://github.com/Xilinx/Vitis-AI/tree/v1.2.1/alveo/apps/yolo/darknet_to_caffe

Modify line 453 and line 455 to avoid the problem that the padding size of maxpool is not an integer.

453c453
<                 pooling_param['pad'] = str((int(block['size'])-1)/2)
---
>                 pooling_param['pad'] = str((int(block['size'])-1)//2)
455c455
<                 pooling_param['pad'] = str((int(block['size'])-1)/2)
---
>                 pooling_param['pad'] = str((int(block['size'])-1)//2)

Copy the model cfg file and trained weight to the working directory.

(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# mkdir caffe_converted
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# python convert.py  yolov3-tiny_mask.cfg yolov3-tiny_mask_60000.weights  caffe_converted/yolov3.prototxt caffe_converted/yolov3.caffemodel

Quantize caffe model

First, create directory for quantization and copy converted caffe model.

(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# mkdir quantization
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# cp caffe_converted/yolov3.prototxt ./quantization/
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/work# cp caffe_converted/yolov3.caffemodel ./quantization/

Edit./quantization/yolov3.prototxt for quantization. uncomment line 2-7. and the following lines.

name: "Darkent2Caffe"
# layer {
# name: "data"
# type: "Input"
# top: "data"
# input_param: { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }
# }
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
yolo_height:224 #change height according to Darknet model
yolo_width:224 #change width  according to Darknet model
}
image_data_param {
source:"/workspace/darknet/Mask/calib_data.txt" #change path accordingly
root_folder:"/workspace/darknet/Mask/yolo/" #change path accordingly
batch_size: 1
shuffle: false
}
}

Prepare the calib_data.txt for the calibration process on quantization. This text file is like

Mask_193.jpg 0
Mask_360.jpg 0
Mask_342.jpg 0
...

"0" means that the image is used in the calibration process. root_folder is specified as the root path of training image.

Run quantization by the following command.

vai_q_caffe quantize          \
-model ./yolov3.prototxt     \
-weights ./yolov3.caffemodel \
-gpu 0 \
-calib_iter 1000

After the quantization process completed, you got the following line on console.

--------------------------------------------------
Output Quantized Train&Test Model:   "./quantize_results/quantize_train_test.prototxt"
Output Quantized Train&Test Weights: "./quantize_results/quantize_train_test.caffemodel"
Output Deploy Weights: "./quantize_results/deploy.caffemodel"
Output Deploy Model:   "./quantize_results/deploy.prototxt"

Compile quantized model for DPU

First, create directory for compile and copy quantized caffe model.

(vitis-ai-caffe) root@lp6m-ryzen:/workspace/xilinx_competition# mkdir compile
(vitis-ai-caffe) root@lp6m-ryzen:/workspace/xilinx_competition# cp quantization/quantize_results/deploy.* ./compile

Modify deploy.prototxt for compile. Uncomment from line 5-9.

layer {
name: "data"
type: "Input"
top: "data"
# transform_param {
#   mirror: false
#   yolo_height: 224
#   yolo_width: 224
# }

Run compile by the following command. In the argument arch, specify the json file that represents the DPU configuration created in Step 2.

vai_c_caffe   --prototxt=./deploy.prototxt \
--caffemodel=./deploy.caffemodel \
--output_dir=compiled \
--net_name=yolov3tiny \
--arch=/workspace/ultra96v2/ultra96v2_vitis_flow_tutorial_1/custom.json

You got the following line after compilation.

Kernel topology "yolov3tiny_kernel_graph.jpg" for network "yolov3tiny"
kernel list info for network "yolov3tiny"
                               Kernel ID : Name
                                       0 : yolov3tiny

                             Kernel Name : yolov3tiny
--------------------------------------------------------------------------------
                             Kernel Type : DPUKernel
                               Code Size : 0.08MB
                              Param Size : 8.27MB
                           Workload MACs : 1578.74MOPS
                         IO Memory Space : 0.34MB
                              Mean Value : 0, 0, 0, 
                      Total Tensor Count : 19
                Boundary Input Tensor(s)   (H*W*C)
                               data:0(0) : 224*224*3

               Boundary Output Tensor(s)   (H*W*C)
                       layer15_conv:0(0) : 7*7*24
                       layer22_conv:0(1) : 14*14*24

                        Total Node Count : 17
                           Input Node(s)   (H*W*C)
                          layer0_conv(0) : 224*224*3

                          Output Node(s)   (H*W*C)
                         layer15_conv(0) : 7*7*24
                         layer22_conv(0) : 14*14*24




**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************

"./compiled/dpu_yolov3tiny.elf" file is generated by this process, and is used for building application

4. create YOLOv3-tiny face mask detect application

Create YOLOv3-tiny face mask detect application based on the following sample.

https://github.com/Xilinx/Vitis-AI/tree/v1.1/mpsoc/vitis_ai_dnndk_samples/adas_detection

The sample application is based on YOLOv3, not YOLOv3-tiny.

My code is uploaded on my github repository. Based on the sample application, I modified some codes for tinyYOLOv3 and add the inference mode for the image captured from the webcam.

https://github.com/lp6m/tiny_yolov3_face_mask_detect

Copy compiled model dpu_yolov3tiny.elf in step 3 to model/ directory.

├── Makefile
├── model
│   └── dpu_yolov3tiny.elf
├── src
│   ├── main.cc
│   └── utils.h
├── Mask_121.jpg
└── youtube_320.mp4

Here, I explain some parts of the modified code.

modify the output node in main.cc

const string outputs_node[2] = {"layer15_conv", "layer22_conv"};

modify the anchor in utils.h: this anchor value corresponds to anchor value in "yolov3-tiny_mask.cfg"

vector<float> biases{81,82, 135, 169, 344,319, 10,14, 23,27, 37,58};

And also add the real-time object detection mode on the image captured from USB camera.

modify Makefile for cross compile

CXX       ?=   g++
CC        ?=   gcc

Build the application on host machine with cross-compile environment.

You can use cross-compile after setting up sysroot environment on host machine.

wget -O sdk.sh https://www.xilinx.com/bin/public/openDownload?filename=sdk.sh
chmod +x sdk.sh
./sdk.sh -d ~/work/petalinux_sdk_vai_1_1_dnndk
unset LD_LIBRARY_PATH
source ~/work/petalinux_sdk_vai_1_1_dnndk/environment-setup-aarch64-xilinx-linux

wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai_v1.1_dnndk.tar.gz
cd vitis-ai_v1.1_dnndk
./install.sh $SDKTARGETSYSROOT 

cd yolov3_face_mask_detection
make

5. deploy YOLOv3-tiny application to Ultra96-V2

Create SDcard

Divide the SD card into two partitions as described in the tutorial referenced in Step 2. First partition(BOOT): vfat, second partition(rootfs): ext4

#First partition
sudo cp prj/Vitis/binary_container_1/sd_card/BOOT.BIN /medial/<username>/BOOT/
sudo cp prj/Vitis/binary_container_1/sd_card/image.ub /medial/<username>/BOOT/
#Second partition
sudo tar xvf rootfs.tar.gz -C /media/<username>/rootfs/

#copy DNNDDK(Vitis-AI) runtime to sdcard
wget -O vitis-ai_v1.1_dnndk.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis-ai_v1.1_dnndk.tar.gz
tar -xvzf vitis-ai_v1.1_dnndk.tar.gz
sudo cp -r ./vitis-ai_v1.1_dnndk /media/<username>/rootfs/home/root

#copy dpu bitstream to sdcard
sudo cp prj/Vitis/binary_container_1/sd_card/dpu.xclbin /media/<username>/rootfs/usr/lib

#copy application to sdcard
sudo cp -r ./yolov3_face_mask_detection /media/<username>/rootfs/home/root/

#wait for copy completion (you can safely remove SD card)
sync

Install Vitis-AI DNNDK runtime on Ultra96

Insert SD card to Ultra96v2, and connect power supply, DP-HDMI display adapter and USB keyboard/mouse/camera. And turn on the device.

Install the Vitis-AI DNNDK environment on your device on the console. This process is required only on the first boot.

cd vitis-ai_v1.1_dnndk
./install.sh

Run application

Before you run the application, you have to change the display resolution.

export DISPLAY=:0.0
xrandr --output DP-1 --mode 640x480

Run the face mask detection. There are three modes: image/viedeo/usb camera.

# image mode
./yolo Mask/Mask_121.jpg i
# video mode
./yolo youtube_320.mp4 v
# webcam realtime mode
./yolo camera c

This video is the real-time face mask detection on image captured from usb camera. FPS is displayed in the upper left of the screen. 22-24fps is achieved.

demo video

Real-time tiny-YOLOv3 face mask detection on Ultra96v2

Things used in this project

Hardware components

Story

Overview

1. train YOLOv3-tiny model on darknet

2. setup Vitis-AI development environment and Ultra96-V2 hardware image.

3. quantize and compile YOLOv3-tiny model

4. create YOLOv3-tiny face mask detect application

5. deploy YOLOv3-tiny application to Ultra96-V2

Code

tiny_yolov3_face_mask_detect

Credits

Yasuhiro Nitta

Comments

Embed the widget on your own site

Real-time tiny-YOLOv3 face mask detection on Ultra96v2

Real-time tiny-YOLOv3 face mask detection on Ultra96v2

Things used in this project

Hardware components

Story

Overview

1. train YOLOv3-tiny model on darknet

2. setup Vitis-AI development environment and Ultra96-V2 hardware image.

3. quantize and compile YOLOv3-tiny model

4. create YOLOv3-tiny face mask detect application

5. deploy YOLOv3-tiny application to Ultra96-V2

Code

tiny_yolov3_face_mask_detect

Credits

Yasuhiro Nitta

Comments

Related channels and tags