Here is a video of the final hardware running the custom trained model:
As we got started in lockdown I had a lot of time to think about projects that I could do that would involve my interest in Machine Learning as well as hardware hacking. l thought that a small device that could monitor egress points (entrances and exits) that could identify people that were complying with mask requirements automatically. This might be an interesting way to allow technology to solve some of the complex human interactions that were taking place while attempting to keep people safe.
I decide that I wanted to create a "Covid Mask Detector" that would use a small cheap Raspberry Pi and a custom built object detection model that was trained on faces both wearing and not wearing masks.
Machine Learning ModelTo create a machine learning model in Tensorflow I first needed to collect a lot of images with faces and annotate the ones wearing masks and not wearing masks. To do this I needed to start by collecting a LOT of images of faces.
Collecting images
To assist in collecting images of faces I wrote a Nodejs script that would use Google Images as its source and pass a search parameter to the service then download the largest version of the results. The code is available here:
https://github.com/contractorwolf/googleimagedownloader
To use the Nodejs script simply go to the folder where it is located and first run npm install (installs axios and fs) then run the script like this:
> node getimages covid masks
The command above would download images relating to "covid masks". I collected images with several different search parameters to ensure that I was collecting images of faces of a wide variety of races/sexes/ages/styles. I spent a lots of time on the search parameters (not just the one above). I wanted to eliminate as much bias as I could when it came to identifying faces as well as identifying medical masks as well as handmaid/custom mask.
After I collected images of a wide variety with both masks and no mask I went through and removed any images that were not helpful to the training of the model. I then moved and resized the images to make them uniform (scripts to automatically do these are in the repo as well). After these tasks were completed I needed to annotate the images and identify where the faces were in each of the images as well as identify if the face had a mask or not.
Annotating the Images
To annotate or "label" the images I used software called LabelImg (direct download) which is open source, documented and available on LabelImg's Github page. This software creates an XML file for each image that defines the machine learning "label" (mask or no mask) for each image as well as the bounding box for the face (where on the image a face is located). The process of annotating a thousand images of masks and no masks took a while but I knew that the accuracy is directly related to the amount of time spent accurately gathering and labeling the images.
After the images had been collected and annotated they had to be divided into training and testing folders with an 80/20 split. The resulting folders can be downloaded if you want to look at the output of the process so far:
Testing images and annotation XML files:
https://github.com/contractorwolf/coronavirus-mask-detection/blob/master/images/test.zip
Testing images and annotation XML files:
https://github.com/contractorwolf/coronavirus-mask-detection/blob/master/images/train.zip
After I had the images needed to generate a machine learning model that could identify the difference between wearing/not wearing a mask I had to write the code to generate the model. I decided to modify some other examples I found written in Tensorflow. I originally tried to do this on my personal desktop machine with my graphics card but hit issues so I decided to try Google's Colab which is a web interface for writing a machine learning notebook that can run on Google's GPU (or TPU!) for free.
I wrote the notebook to use the images defined above and tried to document what was happening with at each step:
https://colab.research.google.com/drive/1uEkP5j7KM9eSkCUtUyauy7-Dyd5XAY6e?usp=sharing
The Colab uses a pretrained MobileNet Coco model and then retrains the last step using the images created above to specialize the image model on identifying faces with and without masks. Here are the main steps documented in the Colab:
- Install the version of Tensorflow needed
- Setup the object detection project
- Download the Coco model
- Download the images needed
- Train model
- Test model
- Create files for Raspberry Pi
- Create files for TPU on Raspberry Pi
The process takes some time and you need to make sure you read and run each block and look at the outputs. If you have any questions about the Colab or the code or anything in this project feel free to leave them below and I will try and answer them.
HardwareThe hardware is basically a Raspberry Pi with the Google Coral TPU (for faster inference). I added the Armor Casing (for looks and device protection) and the Adafruit Mini PiTFT as a user interface so that the device could give basic feedback without needing to be plugged into a monitor.
Google Coral TPU
The Google Coral TPU is connected via USB and is able to handle a large amount of the matrix math that is needed when doing machine learning inference (predictions). Using the TPU (with the model setup to use a TPU and the code set to take advantage of the TPU) allowed the model to be able make predictions at 15 frames-per-second. Without the TPU the Raspberry Pi could only process the predictions of the USB camera images at about 3 frames-per-second.The default code in my Github Repo is set to use the TPU, but you will see the commented out sections that will allow it to run on just the Raspberry Pi without the TPU (albeit some slower).
TFT basic interface
I used the Adafruit MiniPiTFT as the main external interface on this project. The goal was to provide an external notification of the assessment done by the machine learning model as well as some other basic parameters. The box show in the image on the right is the indication of whether the image seen by the USB camera was "wearing a COVID mask" (green box shown below)
When the device sees a person in range it will place a colored box to outline their face. The colored box indicates if they are wearing a mask. If it sees a mask as in the image below it will indicate with a green box around their face as well as flash a green box on the MiniPiTFT as shown above.
If the video captured by the camera sees the primary user without a mask on it will indicate with the "not wearing a COVID mask" (show a red box) as seen below
as well as showing a red box around the face if the monitor is being used, as shown on the image below.
The basic idea was that the device could work independent of being plugged into a monitor or screen by just relying on the. The other displayed parameters are as follows:
- N: (no mask, red) with its estimated certainty
- or M: (mask seen, green) with estimated certainty
- S: size of image in screen by percentF: frames per second that the estimation was doing
- T: temperature of the CPU
I originally just bought the shortest double right-angled USB cable I could find and just strapped it onto the case, but I thought I could do better, so I kept hacking.
In the end I just cut the cable from above and rewired the two ends, so it looked like this:
USB is really just 4 wires and they are color coded so the process is easy. I actually have a small zip-tie on it too to take most of the flex stress and keep them from breaking. In the end it looks like this installed:
The Raspberry Pi can boot into the python program that will monitor the USB webcam and runs a
https://github.com/contractorwolf/rpi-tensorflow-mask-detector/blob/master/maskclassifier.py
modified from a sample of basic tensorflow object detection but using a modified version of the model that has been retrained to identify just "mask" or "no mask" faces. Also modified to use the TPU for quick inference and output data to the mini PiTFT screen.
The code starts up and starts processing images coming from the first USB webcam it can identify. Images are loaded and the machine learning model attempts to classify whether it sees a mask (or not). It will also draw a bounding box around each identifiable face it sees in the image and classifies whether the largest one is wearing a mask. The PiTFT is also used to display calculations about the image. If you do not have the TFT or do not need any additional display that code can simply be commented out. The TPU code can also be removed if your model is not built for TPU processing, but it will do it at a much lower framerate. When I ran it on the RPi 4 with the non-TPU model I was seeing a framerate of ~2-3FPS, whereas the TPU allowed me to process at ~15FPS.
If you have any additional questions simply drop them in the comments and I will help out when I can. Thanks for looking!
Comments
Please log in or sign up to comment.