With the increasing number of cars, trucks and different vehicles on roads, the problem of traffic jams is still growing. The proper traffic lights control allows reducing traffic and raise capacity at intersections, which tend to be traffic bottlenecks. Due to the high costs of continuous crossroads monitoring and the infrastructure for this task, the better way is to use the unmanned aerial vehicle which can monitor, track and inform about the number of vehicles and the time it takes them to cross the intersection. It lets to better manage traffic lights especially in rush hours or during road renovations. For this task, a real-time object tracking algorithm is needed to provide returning results on an ongoing basis. So the project contains the implementation of DeepSORT object tracking algorithm based on YOLOv4 detections which ensure a real-time response. Detector inference class is implemented in several frameworks like TensorFlow, TensorFlow Lite, TensorRT, OpenCV, and OpenVINO in order to benchmark methods and use the best one for edge-tailored solutions.
Step 0: Introduction to Multiple Object TrackingMOT algorithms as a relevant part of present computer vision research are related to autonomous driving and steering, surveillance, and analysis of behavior. Mainly, the MOT problem is divided into sub-tasks such as detecting multiple objects (locating and classifying), adding and keeping their identities, and following their individual trajectories in consecutive frames.
The DeepSORT algorithm is a detector-based method that uses a recursive Kalman filter with a constant motion rate and a linear observation model. The re-identification task is solved using the Hungarian algorithm. In order to improve the projection and performance of the assignment algorithm, a weighted metric was used, consisting of the Mahalanobis distance and the cosine distance. The first metric provides motion information by calculating the distance between predicted Kalman states and newly arrived measurements. The second metric uses the pre-trained deep convolutional network as an image feature descriptor to provide appearance information.
Step 1: Preparation of the detection algorithmAs an object detector, the YOLOv4 algorithm was chosen due to its satisfactory results and real-time processing speed. The neural network model was trained with the use of Darknet framework and on the VisDrone datasets which contain images captured from UAV's perspective. Each independent object belongs to one of the 11 categories. Small and occluded parts of images where were many instances were marked as an ignored region.
ignored_regions
pedestrian
people
bicycle
car
van
truck
tricycle
awning_tricycle
bus
motor
others
The YOLOv4 files are in the linked repository, the configuration file is in data/darknet/yolov4_visdrone.cfg, classes file is in data/classes/visdrone.names, and calculated sizes of anchor boxes are in data/anchors/visdrone_anchors.txt.
Step 2: Edge devices setupSystem setup.
For evaluation purposes, the edge devices like NVIDIA Jetson Xavier NX and Intel Neural Compute Stick 2 were used. Jetson Xavier NX was flashed with JetPack SDK 4.4.1 and the SD card for Raspberry Pi 4B was flashed with Raspberry Pi OS Lite 5.10. The full TensorFlow library was utilized for Raspberry Pi 4B, the build instruction for the 2.2.0 version is available here.
Camera setup.
Due to the utilized driver for the Jetson camera (e-CAM24_CUNX – Color Global Shutter Camera), JetPack in version 4.4.1 has to be used. e-con Systems as a manufacturer provides a camera driver for NVIDIA Jetson Nano and Xavier NX and simple installation instructions.
Step 3: Edge-tailored detector model optimization and quantizationThe optimization and quantization process was performed with the use of TensorRT for NVIDIA Jetson Xavier NX, OpenVINO for Intel Neural Compute Stick 2, and TensorFlow Lite for CPU-based solutions.
TensorRT framework requires converting the model to one of the supported formats like ONNX or TensorFlow. In this project, the change to ONNX format is used. To convert the script yolo_to_onnx.py available in the repository was used with the following call where -c describes a number of classes -m input model and -o output ONNX model path.
python3 yolo_to_onnx.py -c 12 -m ./yolov4-608 -o ./yolov4.onnx
To change the model from ONNX to TensorRT, the onnx_to_tensorrt.py script was used. TensorRT representation of the model is presented in three different data types: FP32 and quantized FP16 and INT8.
- convert ONNX to TensorRT engine with float32 weights
python3 onnx_to_tensorrt.py -v -c 12 -m ./yolov4 -q fp32 -o ./yolov4_fp32.trt
- convert ONNX to TensorRT engine with float16 weights
python3 onnx_to_tensorrt.py -v -c 12 -m ./yolov4 -q fp16 -o ./yolov4_fp16.trt
- convert ONNX to TensorRT engine with int8 weights (needs path to calibration dataset - representative images from the dataset, marked below as './calib_images')
python3 onnx_to_tensorrt.py -v -c 12 -m ./yolov4 -i ./calib_images -q int8 -o ./yolov4_int8.trt
OpenVINO was used ONNX file generated with instruction from tensorrt/README.md and then with the use of OpenVINO Model Optimizer package and commands:
- FP32 data format:
python3 mo.py --input_model ./yolov4.onnx --model_name yolov4_fp32 --data_type FP32 --batch 1
- FP16 data format:
python3 mo.py --input_model ./yolov4.onnx --model_name yolov4_fp16 --data_type FP16 --batch 1
Conversion to TensorFlow Lite was done with the use of the ONNX model and onnx-tensorflow package by command-line interface:
onnx-tf convert -i /path/to/input.onnx -o /path/to/output
It enables change from ONNX format to TensorFlow SavedModel representation. Conversion to TF Lite is available with the use of an internal TensorFlow TFLiteConverter.
- FP32 format
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_PATH)
tflite_model = converter.convert()
# Save the model.
with open(OUTPUT_PATH, 'wb') as f:
f.write(tflite_model)
- FP16 format
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_PATH)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()
# Save the model.
with open(OUTPUT_PATH, 'wb') as f:
f.write(tflite_model)
Step 4: How to use the DeepSORT trackerThe DeepSORT algorithm takes detection results from YOLOv4 and associates them using a recursive Kalman filter and Hungarian algorithm.
In order to run multiple object tracking algorithm, one can use object_tracker.py script as follow:object_tracker.py
python3 object_tracker.py -f trt -m ./yolov4_int8.trt -s 608 -n ./data/classes/visdrone.names -v <PATH TO INPUT VIDEO OR CAMERA> --dont_show True
The results of the object tracker script are presented in the video below. It was performed on NVIDIA Jetson Xavier NX with the use of TensorRT in FP32 mode.
NOTE
If one wants to run inference one NVIDIA device with TensoRT support needs to uncomment TrtYOLO detector import in detectors/__init__.py. The same issue is for utilizing OpenvinoYOLO class on Intel hardware.
Command-line arguments
Usage: object_tracker.py [OPTIONS]
Options:
-f, --framework TEXT Inference framework: {tf, tflite, trt, opencv,
openvino}
-m, --model_path TEXT Path to detection model
-n, --yolo_names TEXT Path to YOLO class names file
-s, --size INTEGER Model input size
-v, --video_path TEXT Path to input video
-o, --output TEXT Path to output, inferenced video
--output_format TEXT Codec used in VideoWriter when saving video to
file
--tiny BOOLEAN If YOLO tiny architecture
--model_type TEXT yolov3 or yolov4
--iou FLOAT IoU threshold
--score_threshold FLOAT Confidence score threshold
--opencv_dnn_target TEXT Precision of OpenCV DNN model
--device TEXT OpenVINO inference device, available: {MYRIAD,
CPU, GPU}
--dont_show BOOLEAN Do not show video output
--info BOOLEAN Show detailed info of tracked objects
--count BOOLEAN Count objects being tracked on screen
--help Show this message and exit.
Step 5: Performance testsThe benchmark tests were performed on NVIDIA Jetson Xavier NX and Intel Neural Compute Stick 2. Jetson Xavier NX was in mode 2 (sudo nvpmodel -m 2
) and fan, clocks were set to the maximum frequency with sudo jetson clocks --fan
command. To evaluate Intel INCS 2 Raspberry Pi 4B was used. The evaluation results are listed below.
During the performance evaluation, the energy efficiency of benchmarked edge devices was checked. The power consumption of Intel Neural Compute Stick 2 and Raspberry Pi 4B was measured with the use of a USB multifunction tester as is shown in the figures below.
Jetson Xavier NX energy usage was inspected with the use of the jetson-stats package which is intended for monitoring and control NVIDIA Jetson devices. The results of the performed tests are presented in the table below.
For comparison, graphics cards used for inferencing and computing in cloud centers like NVIDIA V100 or RTX 3080 have a power consumption at the level of 300 and 320 Watt as follows.
Use cases of a tracking algorithmThe possible use cases of multiple object tracking are:
- surveillance monitoring
- crossroads flow tracking
- monitoring and warning in unsafe places
Many thanks for a great job for:
- The AI Guy: yolov4-deepsort, MIT License
- nwojke: deep_sort, MIT License
- jkjung-avt: tensorrt_demos, MIT License
Comments