With universities and other institutions starting to return to normal as vaccines and other measures are put in place, social distancing for sporting events and other outdoor gatherings will be required. With the HoverGames drone competition, the provided drone combined with on board companion computer aided to create a solution for monitoring crowds and determining how closely social distancing measures are working. With advances in AI and computer vision, the processing of these complex scenes gathered by drones is allowing for the detection of people and distances. These advances allowed for a unique application of drones and aerial imaging for determining social distancing measures.
Project Summary:With the NXP HoverGames drone kit and the NavQ 8MMNavQ provided with the competition, an application was created that utilized machine learning to perform object detection and classification from aerial images. First, the drone kit was assembled and the NavQ was setup. From there, the PyTorch implementation of RetinaNet was trained on the Stanford Drone Dataset (SDD). From this small-scale tests were conducted in adherence with social distancing guidelines and university aviation requirements. Videos from the test were used for validation of the network. The process allowed for the HoverGames drone kit to be used for gathering the data and creating a proof-of-concept social distancing monitoring system.
Step One: Assemble Drone and Setup NavQFor assembling the HoverGames Drone kit please refer to the guide provided by NXP here. For Setting up the NavQ computer refer to the guide here
Step Two: Download PyTorch and Stanford Drone DatasetFor the object detection and person recognition, please refer to the PyTorch website for installation instructions found here. To download the test, training, and validation sets for the Stanford Drone Dataset (SDD), please refer to this link. The dataset was created by Priya Dwivedi for a Keras implementation of Retinanet on the SDD. The github repository for that project is found here.
Step Three: Train RetinaNet on SDDThe purpose behind selecting RetinaNet for detection stems from a combination of others successfully training the network on the SDD coupled with the ability of the network to perform the feature recognition on the complex scenes would miss. Further information about the RetinaNet architecture can be found here. The training script is attached included in with this guide. A handful of modifications were made to the RetinaNet to improve model accuracy on the SDD. First, the minimum bounding box was decreased to 16x16 pixels compared to the original size of 32x32 pixels. Likewise, the maximum bounding box sizer was decreased from 512x512pixels to 256x256pixels. These changes were implemented with the following lines:
model= models.detection.retinanet_resnet50_fpn(num_classes=7, pretrained=False, pretrained_backbone=True)
#! Generate smaller anchors -- copied directly from model setup
anchor_sizes = tuple((x, int(x * 2 ** (1.0 / 3)), int(x * 2 ** (2.0 / 3))) for x in [16, 32, 64, 128, 256])
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)
#! Update the anchor generator inside the model
model.anchor_generator = anchor_generator
These changes were determined from researching other implementations of RetinaNet on different datasets. For future changes to the RetinaNet Architecture, the PyTorch source code provides documentation on the different parameters of the network and how they could be changed. Furthermore, depending on the accuracy and computational resources available, the model backbone can be decreased from ResNet-50 to ResNet-32 or ResNet-18 to decrease the number of trainable parameters. This change can be implemented with the following lines of code:
model.backbone = resnet_fpn_backbone('resnet18', pretrained=True, returned_layers=[2, 3, 4], trainable_layers=0, extra_blocks=LastLevelP6P7(256, 256))
Finally, if finding the number of trainable parameters is required, then the following addition can be made to the training code:
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("trainable: ", pytorch_total_params)
Training the RetinaNet on the SDD required creating a custom dataloader. The image paths, bounding boxes, and labels were imported from the respective .csv files. The implementation of the custom dataloader is included in the training code. Please refer to those comments for a detailed description of what the lines perform. Running the training script will train the model for a specified number of epochs and batch sizes. Training time will vary depending on computational resources as the dataset provided is very large and memory requirements will vary between computers. Depending on your computational resources training could take a couple of days to a week to complete to reach a desirable accuracy.
Step Four: Gather Validation DataOnce the model training has been completed, the time for validating and testing the model has arrived. The drone can be programmed in QGroundControl to follow a specific path and hover at a given altitude. From the QGroundControl program, the estimated flight time can be determined. With this estimated flight time, a python script can be run on the NavQ with the camera aimed downwards as shown in this image:
For connecting to the NavQ board, create a mobile hotspot on either a laptop or mobile phone and connect the NavQ and a laptop to it. SSH into the NavQ to execute the python script. It is recommened to use tmux
as the terminal since it allows for processes to continue running even if the connection is broken or terminated. Instructions for using tmux
can be found here.
Create a mission in QGroundControl following this guide and upload the mission to the drone. I achieved the best results when hovering between 15-30m in the air. When flying the drone, please check local FAA regulations for your area. When the script finishes, the file will saved to the NavQ. It is recommended to create a github repository to aid in getting the videos from the NavQ for analysis.
Once you have the video recorded with people in the scene, run the following python code to save each frame as a separate image for validating the image network:
import cv2
path = "path_to_video_file"
dir = "directory_to_save_images/"
vidcap = cv2.VideoCapture(path)
success,image = vidcap.read()
count = 0
while success:
cv2.imwrite(dir+"frame%d.jpg" % count, image) # save frame as JPEG file
success,image = vidcap.read()
print('Read a new frame: ', success)
count += 1
The python file will take the video and convert each frame into a separate file.
Performing validation occurs in the included Jupyter Notebook. Running the notebook will sample the images from your test. Below is a sample image from a test on a personal validation image set:
Based the accuracy that you achieve, you may need to adjust either the network parameters, number of epochs or other possibilities.
Step Five: Live Feed DetectionThe final step in the project is to establish a live feed between the drone and a computer performing the analysis. From here, the distances between people can be calculated from the height of the drone and the camera characteristics. Due to an unforeseen weather events over the past three weeks, the final stage of the project which involved implementing the live stream and detection of social distancing was not fully implement. The section of this project will be updated weather permitting over the following weeks to include the necessary documentation to achieve real-time person detection and social distancing. The equation for determining distancing is included in the final cell of the jupyter notebook for further implementation of distancing. The cover image was originally used as a sample image to determine the distances between people and verify the equation functioned as expected.
Acknowledgements:I would like to thank Liberty University and the Center for Research and Scholarship for support in this project. I would also like to thank my research advisor and TRACER lab director, Dr. Medina, for his guidance in the project.
Comments