The Real-Time Physical Distancing Monitor (RTPDM) is a system for the purpose of monitoring and evaluating physical distancing (known also as social distancing) policy compliance in open public areas. The system comprises one or more Detection Nodes and a Cloud Server. Each Detection Node comprises a camera and an embedded computer and its purpose is to detect pedestrians and compute the real distances between them. This data is then uploaded to the Cloud Server and stored in a database for further analysis and visualization. The Cloud Server presents a home page in which a map of the city can be seen with markers in all places where there is a Detection Node installed. System users can access the home page and click any marker to browse to the corresponding node's monitoring page, in which graphical data about physical distancing in the current location is seen.
This graphical data comprises a timeline plot of the total number of pedestrians detected by the system, the total number of physical distancing violations among the detected pedestrians and the violations as a percentage of all possible one-to-one physical interactions between them. The system is meant to be used by authorities in charge of enforcing physical distancing policies in order to evaluate, correct and re-issue better policies. It is meant to be used as well by regular citizens to access real-time data about how crowded is a given public spot; this gives them insight about the risk degree of being infected by the virus in that place, due to the degree of physical distancing violations. Because the monitoring page can show historical data as well, it is possible to evaluate historically the place's behavior regarding physical distancing. For instance, to determine which days and which hours of the day the physical distancing violation index is higher, with consequently higher risk degree of infection for the people. With this information at hand, the authorities can devise alternatives to reduce crowds in peak hours, or the citizens can voluntarily avoid certain places and/or time of the day with higher infection risk.
To protect the privacy of the individuals, the system blurs the detected pedestrians in the image and a low-resolution copy of it is sent to the Cloud Server for visualization purposes. No other images or data regarding the individuals detected in the image are stored permanently in the Detection Nodes or in the Cloud Server.
Block Diagram and Functional DescriptionFigure 1 shows the system's block diagram. The Detection Nodes are composed of a Raspberry Pi 3 B+ or 4B computer and a camera. I used a Logitech C270 webcam for the prototype, but any regular webcam or a Raspberry Pi Camera V1.2 or V2.0 should work as well.
The Detection Node takes the video stream input from its camera and runs it through a deep learning computer vision model to detect pedestrians. Image frame "pixel coordinates" are obtained for each detected pedestrian and then, an inverse Homography transformation from 2D "image frame" coordinate system to 3D "world" coordinate system is applied to each one of them to calculate their real world Cartesian coordinates in meters. These coordinates allow the calculation of precise Euclidean distances (with errors within a few centimeters) between all pedestrians in order to determine how many violations to the minimum physical distancing (1.8 meters in our case) is currently happening in the current analyzed frame.
Once the former analysis is done for the current image, time stamped image frame coordinates (u, v) and Cartesian coordinates (x, y, z) for every detected pedestrian are sent to the Cloud Server, along with the physical distancing violation index data and a low-resolution copy of the analyzed image frame. The Cloud Server stores the time stamp and the physical distancing violation index data in a database, and the image frame picture in its local file system.
After accessing the system's home page, real-time data from every Detection Node in the system can be accessed with a mouse click. For the purpose of this prototype, I built just one Detection Node to monitor it in real-time and used pre-recorded data for the rest of the nodes for demonstration purposes. I'm using also a pre-recorded video for the demonstration, due to the quarantine restrictions in my country that made very hard to test it in a real-life setting.
For the Cloud Server implementation I used a regular hosting service capable of running PHP & MySQL. Because the computer vision detection task is run "at the edge" in the Raspberry Pi, the load over the Cloud Server and required Internet bandwidth is very low.
See Video 1 below for a demo of the system running.
The Detection NodesFigure 2 shows the Detection Node hardware prototype, which is very simple because it comprises mainly of the Raspberry Pi computer, the webcam and some accessories. Almost all in this project is mostly software. The Detection Node should be installed 3-4 meters above the ground and with a downwards angle towards the street area we want to monitor (see Figure 3). A Python script runs in the Detection Node to read the video stream from the camera, apply the deep learning inference for pedestrian detection, apply the inverse Homography transformation to compute pedestrian real world coordinates, and send all data to the Cloud Server via HTTP POST requests.
One of the first tasks in the Python script is to open the 'Logitech_C270_intrinsics_1280x720.yaml' camera calibration file containing the camera matrix and distortion coefficients. This file is automatically created after making the camera calibration procedure (more on camera calibration later). After that, the chosen deep learning model is set for the inference and next the video stream from the video file (for testing) or from the webcam is opened. The images taken by the camera are run through the SSD Mobilenet V1 deep learning model trained with the MS COCO dataset for detecting pedestrians. The center-down point of an enclosing rectangle surrounding the detected pedestrian is obtained as a pair of image frame coordinates (u, v) for each one of pedestrians. Next, an inverse Homography transformation from the 2D camera plane to the real 3D world space in XYZ is applied to each image frame coordinate pair. The inverse Homography transformation gives us the "real" 3D coordinates of each pedestrian, which in turn allows the calculation of "real" Euclidean distances between all pedestrians in order to determine how many violations to the minimum physical distancing (1.8 meters in our case) is currently happening in the current frame.
Once the analysis of the image frame is done, the frame is saved to the Raspberry Pi's local file system, to be sent later to the Cloud Server. A JSON string containing a time stamp, the (u, v) image frame coordinates and (x, y, z) world coordinates of every detected pedestrian is also prepared to be sent to the Cloud Server as well. The frame rate obtained in the Raspberry Pi is about 1 FPS (Frames Per Second), because of the heavy load imposed primarily by the deep learning inference. For that reason, the frequency at which the data is sent to the Cloud Server is 1 Hertz (once every second); which is very reasonable for the application.
The Python script 'detect.py' contains all code for computer vision pedestrian detection, Homography calculations for obtaining real 3D coordinates and sending data to the Cloud Server.
For the inverse homography transformation to work accurately, the camera must be properly calibrated to obtain the camera intrinsics. Included in the source code there's a 'cameraCalibration.py' file to do the calibration, which saves to disk a 'yaml' calibration file containing the camera matrix, the distortion coefficients, the image size used for the calibration process and the root mean square (RMS) error, which is a measure of the achieved calibration accuracy. This file is a modified version of the one included in the official OpenCV code repository. The camera calibration procedure must be done just once and it must be repeated only if you change the camera. It can be done with the Raspberry Pi or with any other computer, as long as the same camera is used.
To do the camera calibration, a set of chessboard photos are taken and put in a folder next to the 'cameraCalibration.py' script. When the calibration finishes, the 'Logitech_C270_intrinsics_1280x720.yaml' file containing the obtained camera intrinsics is generated in the same folder. This file must be copied along with the 'detect.py' Python script in the Raspberry Pi.See the 'install_run.md' file included in the source code for the suggested steps and/or links to web resources for doing this calibration.
Video 2 shows how the camera calibration procedure looks like.
Homography TransformationIf we have the real world XYZ coordinates of at least four co-planar points (say four points at the same level in the street we want to monitor), and we get from the camera image frame the pixel coordinates of the same four points, with those eight coordinates (in two different systems) we can get the camera pose (position and orientation) in world coordinates by computing the Homography transformation between the plane formed by the street in 3D world coordinates and the the virtual plane formed by the image frame in pixel coordinates. Once the camera pose is obtained, the 3D world coordinates of any other measured point at floor level in the image frame (in pixel coordinates) can be easily obtained by applying the inverse of the Homography transformation. This procedure gives us the real world 3D coordinates of pedestrians in meters, which in turn allows us to calculate real Euclidean distances between them with errors of a few centimeters, depending on the camera calibration and co-planar point measurements accuracy.
In other words, we must pick four points at the level of the floor, get their real XY coordinates in meters, and then take a picture with the camera in their defined position and orientation, get the pixel coordinates of all four points and change the values in the 'detect.py' script to get the Homography transformation correctly working. See the 'install_run.md' file in the source code for more details. This Homography calibration must also be done just once, unless the camera, or its position and/or orientation is changed. Figure 4 shows the graphical and mathematical description of a Homography transformation.
Once the JSON pedestrian data is ready after processing a given image frame, that data is sent to the Cloud Server with a couple of HTTP POST requests, along with the image frame 'JPG' file from which the data has been obtained
The Cloud Server has five PHP scripts and two HTML pages in charge of receiving pedestrian data from the Detection Nodes, storing them in the database and visualizing them in the corresponding web pages. Figure 5 shows the system’s home page with the markers showing the position of every Detection Node present in the system (i. e. Cameras installed in different places in a given city). Figure 6 shows the monitoring page for a given Detection Node with its corresponding physical distancing data. Each Detection Node present in the system has a monitoring page accessible by clicking the map markers at the home page. As you can see in Figures 5 and 6, the current weather data and weather forecast data for the specific latitude and longitude coordinates of the city or the given Detection Node is also shown in the corresponding web pages.
The files comprising the software running at the Cloud Server are the following:
‘receive_indexes_json.php’: This PHP script receives from the Detection Node a JSON string containing the data required to calculate physical distancing indexes: a time stamp, the total number of detected pedestrians, the number of detected distancing violations (i. e. pairs of pedestrians which distance less than 1.8 meters) and the number of violations as a percentage of all possible one-to-one interactions between pedestrians (see Figure 7). The script receives the data and stores them in a MySQL database table. These data is then used to generate the first plot shown in Figure 6.
‘receive_points_json.php’: This PHP script is in charge of receiving the JSON string containing the time stamp, image frame coordinates (u, v) and computed Cartesian coordinates (x, y, z) for every detected pedestrian (see Figure 8). Once they are received, the script stores them in a JSON file in the server’s local file system. These data is then used to generate the second plot shown in Figure 6 (the first plot in the second row).
‘receive_image.php’: This script receives the particular image frame, source of the current pedestrian data received at the server. This image is shown in the third plot in Figure 6 (the second plot in the second row).
‘index.html’: This is the system’s home page (see Figure 5) that shows all places in the city where a Detection Node has been installed. The file contains HTML and JavaScript code and uses the Google Maps API to draw the map and Detection Node markers. When a marker is clicked, an emerging "info window" shows the address, ID and latitude/longitude of the given node with a link to its monitoring page. After clicking the link the corresponding node visualization page is open to show physical distancing data from that particular node. The page also shows the current weather data and weather forecast data for the city. The OpenWeather API is used to obtain the aforementioned weather data.
‘node.html’: This is the web page in charge of displaying the physical distancing data plots for a particular Detection Node selected from the home page. It uses the ‘Plotly’ JavaScript library for the plot drawings and updates the data in real-time by performing AJAX queries to the server every few seconds. If you point to the first and second plots with the mouse cursor, you can see specific data from the region in the graph where the cursor is pointing at.
‘query_indexes.php’: This PHP script is asynchronously called via AJAX request from the ‘node.html’ file to obtain distancing indexes from the server. It reads physical distancing index data from the database and answers the query with a JSON string containing the aforementioned data.
‘query_points.php’: This script is also called via AJAX from the ‘node.html’ file to obtain from the server the image coordinates (u, v) and Cartesian coordinates (x, y, z) for every pedestrian detected in the current image frame. it also responds the query with a JSON string containing the aforementioned data.
MySQL DatabaseThe system's MySQL database has one table for each Detection Node present in the system. Figure 9 shows the SQL query to create the table for each Detection Node, which also gives insight about its structure. For the system to work, a database must be created at the Cloud Server and a user/password configured with sufficient privileges to access the database. The 'login.php' file in the source code must then be changed to reflect the correct database name, user name and password.
These are the main steps taken to implement this project:
1. Prepare the Detection Node
- Burn the latest Raspbian operating system in a micro SD card for the Raspberry Pi computer.
- Boot the Raspberry Pi and Install OpenCV along with the 'yaml', 'requests' and 'json' Python packages.
- Copy from the source code repository the 'detection-node' folder to the Raspberry Pi's home folder or any other location you prefer.
- Perform the camera calibration procedure.
- Perform the Homography calibration procedure.
2. Prepare the Cloud Server
- Upload to your web server (you can use almost any regular web hosting server) the content of the 'server' folder in the code repository to any folder or location you prefer.
- Open the 'detect.py' Python script in the Raspberry Pi and change all references to the my server for yours. For instance in code line 318 change 'http://tec.bo/covid19-challenge/receive_indexes_json.php' for 'http://<your-server-path>/receive_indexes_json.php'. And so on, so forth for any reference to the web server.
- In your web server create a database for the project, along with a user/password with reading and writing privileges. Change the 'login.php' file in the server to reflect the new database name, user name and password for accessing the database.
- In the database create a table for each Detection Node you want to run by running the SQL query shown in Figure 9 (use a different table name for each node).
- Create a Google Maps API account and obtain a key for accessing the API. Change you API key in the 'index.html' and 'node, html' files.
- Create an OpenWeather API account and obtain a key for accessing the API. Change you API key in the 'index.html' and 'node, html' files.
3. Run a test of the system
- In the Raspberry Pi, go to the folder where the Python code is located and run the 'detect.py'. A window should open to show the detection image frame with detected pedestrians.
- Open a web browser and point to your web server's root address. You should see the system's home page.
I included in the source code the file 'install_run.md' with instructions for building and running the system. Please refer to that file for detailed instructions.
Conclusions and Future ImprovementsThe prototype performs very well and the system has great potential for the use in a real application. One of the system's weaknesses however, is the relatively low detection accuracy of the deep learning model used for this prototype (SSD Mobilenet V1 trained with the MS COCO dataset). The Raspberry Pi 3 B+ computer by itself doesn't have enough power to run a deep learning model sufficiently accurate for the system to be used in a real world application. Lightweight deep learning models have reasonable FPS performance but lower accuracy with many false positives. More accurate models are heavier in terms of required computer power, but have much better accuracy. Nevertheless, much can be improved by upgrading the system, for instance, with a newer Raspberry Pi 4 B model and a Google Coral USB accelerator or an Intel® Neural Compute Stick 2 as companion processors for the Raspberry Pi. Another option is to replace the Raspberry Pi by a much powerful computer, such an NVIDIA Jetson Nano or a Google Coral Board. There's also the possibility of training and optimizing a custom deep learning model specifically for detecting pedestrians in a more efficient manner (sadly, I didn't have enough time to try that option before submission date).
As for the functionality, there are a number of improvements that can be made to the system. For instance, the process of calibrating the camera and measuring the reference points for the Homography transformation calibration can be automatized. Additional artificial intelligence software for analyzing the massive data from the Detection Nodes can be also written, in order to detect patterns in the data and analyze more profoundly the physical distancing data and correlate it with other types of data, such as the increase/decrease of infections or deceases over longer periods of time.
Comments