Created December 7, 2020 © GPL3+

Mask Usage Analysis in Public Places

Use ZCU104's video analysis capabilities to allow businesses and schools to track whether masks are being worn in public places.

AdvancedFull instructions providedOver 1 day33

Things used in this project

Hardware components

Zynq UltraScale+ MPSoC ZCU104

Camera (generic)

Network Camera (Generic)

Software apps and online services

TensorFlow

Vitis AI

AMD PetaLinux

OpenCV – Open Source Computer Vision Library OpenCV

Python

Story

Motivation

The goal of this project was to create a system that can be used by businesses and schools to monitor mask-usage in public places (to figure out where people are congregating and where stronger enforcement is required). However, it is also important that the system maintain people's privacy, so this data must be anonymized and aggregated when it is logged, without saving exact data about specific individuals. Finally, it must be able to connect to existing security systems (where installed), or run as a standalone system using any generic cameras (for places which don't have a sufficient security system).

Overview

This system centers around the Xilinx ZCU104 board, running a PetaLinux image with Vitis AI and the relevant software to access the VCU. It streams video, either from connected USB cameras or compressed video from networked IP cameras, and then feeds it into a face-detection neural network, which was quantized using and compiled using Vitis AI (note that this network only detects locations of faces, but does not attempt to identify the face or safe any information about it, hence preserving privacy). The patches extracted from this network (which are the exact faces themselves) are then cropped out, resized, and fed into a second (very tiny) neural network, which classifies the faces based on whether a mask is being worn or not (as well as an experimental feature which detects if people are wearing the masks incorrectly - i.e. not having the mask cover their nose). The results are then annotated back onto the video stream using simple colored-boxes (as well as being logged to a spreadsheet for later analysis) and streamed back out, so they can be viewed using a standard security-camera remote interface.

Training process

The main inference pipeline is in 2 parts. One is the densenet model to detect faces (which is the heavy part of the workload, and requires the DPU in order to be at all usable), and the other is the tiny conv-net to classify whether a face is masked or not (takes a 32x32x3 image and outputs a classification). The densenet is based on a standard face-detection densenet model, and the training for the fase-mask-classifier can be seen in the training directory in the code respository.

Inference on FPGA

The model is run on the FPGA by starting a gstreamer pipeline which pulls frames from either the local camera or an IP camera, and then running the face-detect and mask-classify binaries (with the output of face-detect piped into mask-classify). This will start a remote-streaming server with the annotated stream (which can be connected back into existing security systems).

Demo

Credits

Anish

1 project • 0 followers

Hardware hacker interested in reconfigurable computing, robotics, and deep learning.

Contact

Thanks to Larxel.

Comments

Please log in or sign up to comment.

Mask Usage Analysis in Public Places

Things used in this project

Hardware components

Software apps and online services

Story

Motivation

Overview

Training process

Inference on FPGA

Demo

Code

Code Repository

Credits

Anish

Comments

Embed the widget on your own site

Mask Usage Analysis in Public Places

Mask Usage Analysis in Public Places

Things used in this project

Hardware components

Software apps and online services

Story

Motivation

Overview

Training process

Inference on FPGA

Demo

Code

Code Repository

Credits

Anish

Comments

Related channels and tags