Facing the current pandemic is a hard task, challenging all of humanity. At the current phase of the Covid-19 pandemic it is more important than ever to find and stop any newly emerging chains of infection. Experts and Governments around the world decided, that wearing masks in public places is vastly decreasing the grow of such undetected chains.
However many people do not support these governmental regulations and enforcing every customer to wear a mask requires valuable workforce, as fines are very high.
Overview and GoalWe decided to carry out our duty and help companies to prevent people from entering a company property without wearing a facial covering.
To achieve our goal, we are using a machine learning algorithm to detect people on an (security camera) image without an appropriate facial covering.
To target as many businesses as possible, we decided to make our algorithm available over a hosted REST service to ensure minimal local setup requirements. Easy to use Python, C# and Assembler client-libraries are to be added aswell.
This approach gives freedom to all users, as it can be deployed adaptive to the environment. The centralized server ensures an easy setup and an always updated recognition models, aswell as Telegram alarms once a human without a mask was detected.
For use cases, where cameras are feeding confidential property or data protection is important, the user can host his own server, using our open-source solution, making sure that the images are never leaving the building or company.
Accessing the public APIThe API is hosted at a public endpoint https://api.hephaistos.online and can be accessed by everyone with an account on our website https://www.hephaistos.online.
The website helps in managing your personal data aswell as your API-Token. After verifying your image quality over the interactive upload form, you can start using one of the provided Python or C# libraries, to automatically upload your camera feed. Setting up an automatic telegram alert is another feature provided by our website.
If the provided libraries do not satisfy your needs, you can always develop your own library. The following spcification should help when designing your own library (Swagger/OpenAPI definition).
API-Endpoint
https://api.hephaistos.online/api/hephaistos/detection
Header
Key: "Authorization", Value: "Token xxxx"
Body (form-data)
Key: "file", Value: [byte Stream of jpg]
When using our public service is not matching your criteria, you can follow the steps below, on how to train your own model.
If you are generally satisfied with our provided model, but your use-case does not allow the transmission of your images over the internet, you can grab our repository and host our docker-compose setup on every server within your company.
Gathering training dataTraining the model is an important key-factor. Therefore we gathered as much available data from datasets on https://www.kaggle.com/ as possible. By the time this article is written we are using the following datasets, resulting in over 9000 images of training data.
- Mask Datasets V1
- With/Without Mask
- Face mask dataset
- covid-19_mask_detection
- face_mask_and_kerchief
- Face Mask ~12K Images Dataset
Using this huge amount of data is a challenge by itself, which we solved by indexing all of the data into a single csv file, which contains information about every single image.
Using the generated index file, we can request a pandas.DataFrame
holding all images which match a certain criteria (e.g. exactly 1 human per picture), for meaningful training data including a proper classification.
As indexing the dataset is a special case for each dataset, it cannot be discussed within this article, but interesting readers can view the code in our public repository at Hephaistos/Detection/sanitize_data.py
Train the modelTensorflow Keras and PIL are the two main Python libraries we used to train our face detection model.
We use Data Augmentation to prevent overfitting by manipulating the pictures randomly. These manipulations include 15 degree rotations in both directions, up to 15% shifting in horizontal, as well as vertical direction, flipping and zooming the picture by up to 20%.
image_generator = ImageDataGenerator(
rescale=1./255,
rotation_range=15,
width_shift_range=.15,
height_shift_range=.15,
horizontal_flip=True,
zoom_range=0.2
)
All picturentroIntroIntroIntroI filepaths from the index file with 1 person will be loaded into a panda dataframe. 15% of the pictures are used for validating and 10% for testing at the end. A Image Generator is used for each of these frames to scale the pictures into a 256x256 color image with categorical class data.
Explaining the Model Structure
We use Sequential Model which is one of the easiest versions of neural networks. It only flows forwards like a river. Each layer has one input and one output tensor.A sequential model should not be used if the model has multiple inputs or multiple outputs, any layer has multiple inputs or multiple outputs, layer sharing is required or if a non-linear topology is wanted.
The 2D ConvoutionConvoution convolutes the input layer to create a tensor of outputs. In our case we chose different filters but kept the kernel size which specifies the height and width of the convolution window constant. The goal of this convolution is the feature extraction. The number 3 has three horizontal lines and 2 vertical ones. These features could be extracted and used later in the model to determine which number was shown on a picture
2D Max Polling with a default window of 2x2 halves both width and width and height and picks the highest value of that window to represent the area.
Dropout prevents overfitting by randomly disconnecting neurons in the Hidden Layers. This means that robust features need to be found which are useful in combinations of other neurons. So in each Epoch 20 to 30% of the Neurons get disabled for this step.
The function Flatten reduces the Dimensions of the Tensor to one and converts a 16x16x16 object into 4096 outputs so Dense layers can work with them. We now could have all the possible horizontal and vertical lines on a picture of the number 3. We now train the model so it understand that these three lines in multiple specific combinations are indeed the number 3.
Model 1 with Adam optimizer:
This model is very basic. It consists of only Conv2D layers and one big Dense layer before the output one. It was only meant to be a test network.
model_1 = Sequential([
Conv2D(32, 3, padding='same', strides=2,activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', strides=2,activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', strides=2,activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(num_classes, activation='softmax')
])
Model 2 with RMSprop optimizer:
This model was mainly used for all models we have trained with different data so far. It combines the usual Conv2D and Dense layers with Dropout.
model_2 = Sequential([
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', strides=4, activation='relu'),
MaxPooling2D(),
Dropout(0.2),
Conv2D(128, 3, padding='same', strides=2, activation='relu'),
MaxPooling2D(),
Dropout(0.2),
Flatten(),
Dense(512, activation='relu'),
Dropout(0.3),
Dense(64, activation='relu'),
Dropout(0.3),
Dense(num_classes, activation='softmax')
])
Models 3 to 5 are variations of Model 2 with the Adam optimizer. In the following only the base model will be displayed, all other steps are the same.
Model 3 with Adam optimizer:
In this model we decided to reduce the dimensionality of the Conv2D output space as well as the number of Dense layers down from 3 to 2.
model_3 = Sequential([
Conv2D(32, 3, padding='same', activation='relu'),
Conv2D(32, 3, padding='same', strides=2, activation='relu'),
Conv2D(64, 3, padding='same', strides=4, activation='relu'),
Dense(512, activation='relu'),
Dense(num_classes, activation='softmax')
])
Model 4 with Adam optimizer:
This model has higher Conv2D dimensions in the beginning instead of the end and also an additional Dense layer.
model_4 = Sequential([
Conv2D(128, 3, padding='same', activation='relu'),
Conv2D(64, 3, padding='same', strides=2, activation='relu'),
Conv2D(32, 3, padding='same', strides=4, activation='relu'),
Dense(256, activation='relu'),
Dense(32, activation='relu'),
Dense(8, activation='relu'),
Dense(num_classes, activation='softmax')
])
Model 5 with Adam optimizer:
This model has only the Conv2D dimensions switched.
model_5 = Sequential([
Conv2D(32, 3, padding='same', activation='relu'),
Conv2D(128, 3, padding='same', strides=2, activation='relu'),
Conv2D(64, 3, padding='same', strides=4, activation='relu'),
Dense(512, activation='relu'),
Dense(32, activation='relu'),
Dense(num_classes, activation='softmax')
])
Model 6with Grayscale
model_gray = Sequential([
Conv2D(32, 3, padding='same', activation='relu'),
Conv2D(64, 3, padding='same', strides=2, activation='relu'),
Conv2D(128, 3, padding='same', strides=4, activation='relu'),
Dense(512, activation='relu'),
Dense(16, activation='relu'),
Dense(num_classes, activation='softmax')
])
The first model ran for 10 epochs. The other models used different callback methods to adjust the learning.
- Model Checkpoint: we save the weights for the epoch with the best validation accuracy to restore it at the end of training.
- Early Stopping: if you can't improve the validation loss of our model in 10 epochs we stop training. The default length is 50 epochs.
- ReduceLROnPlateau: If the validation loss didn't decrease for 3 epochs we cut the learning rate in half.
The results of the different models can be seen in the pictures below.
Models with the Adam optimizer often jump all over the place. It is difficult to see if they really finding the minimum of the loss function. Early Stopping kicked in at around epoch 15 because the minimum of the validation loss was found already at epoch 5 or even more shockingly at epoch 3 in model 3.
We trained Model 2 again with the Adam optimizer and see that the validation accuracy is pretty much the same. The loss on the other hand is about 25% less. That means that theoretically the Adam optimizer yields better results.
In the next step we try to combine the predictions of multiple models to combat fails of single models. After evaluating the different models multiple times we decided to use the models 2, 5 and 6 (grayscale). The confusion matrices for these models look very promising. About 99% of a people without mask get detected correctly. Some mask will sadly not be detected correctly. This could have multiple reasons. The main issue is that there are many different face mask and every mask looks different. While the detection works very well with white or blue surgical masks, black or transparent masks are a problem.
To evaluate the final prediction result we add the probabilities of these 3 models together. The class with the highest confidence will be chosen for the picture.
To counter the flaws of the model we decided to use a pretrained face detection model to filter people with obvious masks. The face detection tries to find a face on the picture. If the face is covered up by a mask, scarf or else, the model likely doesn't detect a face. This is perfect for our case because it means that there is no person on the picture or that the person has covered up their mouth and nose.If the face detection detects a face it will be forwarded to our mask detection and evaluated.
Host your own serverThe simplest way to create the server is to use the docker compose file.It's provided in the GitHub repository. Before you can start your server you must change the configuration settings for the rest-API. To do this, you have to adapt the .env file to your setup. Alternative you can create an hidden folder.env.local and copy the file inside (Not tracked by git). In this configuration you can change the secret keys, the database connection and server host name. You can also change the port for the python server it's specified in single_image_detection.py. After you have successfully configured the server you must initialize the database with the SQL script. Now you're ready to go.
Try it out and let us know if it works.
Comments
Please log in or sign up to comment.