DeepSORT YOLOv4 based object tracking
Step 0: Introduction to Multiple Object Tracking
Step 1: Preparation of the detection algorithm
Step 2: Edge devices setup
Step 3: Edge-tailored detector model optimization and quantization
Step 4: How to use the DeepSORT tracker
Step 5: Performance tests
Step 6: Power consumption tests
Use cases of a tracking algorithm
References

•

Published August 5, 2021 © MIT

Deep Edge Tracker

Multiple Object Tracking with the use of the DeepSORT algorithm based on YOLOv4 object detector tested and benchmarked on edge devices.

AdvancedProtip3,876

Things used in this project

Hardware components

NVIDIA Jetson Xavier NX

Raspberry Pi 4B

USB Multifunction Tester

e-con Systems e-CAM24_CUNX – Color Global shutter Camera

Software apps and online services

NVIDIA TensorRT

Intel OpenVINO™ toolkit

TensorFlow

Raspberry Pi Raspbian

Story

DeepSORT YOLOv4 based object tracking

With the increasing number of cars, trucks and different vehicles on roads, the problem of traffic jams is still growing. The proper traffic lights control allows reducing traffic and raise capacity at intersections, which tend to be traffic bottlenecks. Due to the high costs of continuous crossroads monitoring and the infrastructure for this task, the better way is to use the unmanned aerial vehicle which can monitor, track and inform about the number of vehicles and the time it takes them to cross the intersection. It lets to better manage traffic lights especially in rush hours or during road renovations. For this task, a real-time object tracking algorithm is needed to provide returning results on an ongoing basis. So the project contains the implementation of DeepSORT object tracking algorithm based on YOLOv4 detections which ensure a real-time response. Detector inference class is implemented in several frameworks like TensorFlow, TensorFlow Lite, TensorRT, OpenCV, and OpenVINO in order to benchmark methods and use the best one for edge-tailored solutions.

Step 0: Introduction to Multiple Object Tracking

MOT algorithms as a relevant part of present computer vision research are related to autonomous driving and steering, surveillance, and analysis of behavior. Mainly, the MOT problem is divided into sub-tasks such as detecting multiple objects (locating and classifying), adding and keeping their identities, and following their individual trajectories in consecutive frames.

Object detection based MOT algorithm

The DeepSORT algorithm is a detector-based method that uses a recursive Kalman filter with a constant motion rate and a linear observation model. The re-identification task is solved using the Hungarian algorithm. In order to improve the projection and performance of the assignment algorithm, a weighted metric was used, consisting of the Mahalanobis distance and the cosine distance. The first metric provides motion information by calculating the distance between predicted Kalman states and newly arrived measurements. The second metric uses the pre-trained deep convolutional network as an image feature descriptor to provide appearance information.

Step 1: Preparation of the detection algorithm

As an object detector, the YOLOv4 algorithm was chosen due to its satisfactory results and real-time processing speed. The neural network model was trained with the use of Darknet framework and on the VisDrone datasets which contain images captured from UAV's perspective. Each independent object belongs to one of the 11 categories. Small and occluded parts of images where were many instances were marked as an ignored region.

ignored_regions
pedestrian
people
bicycle
car
van
truck
tricycle
awning_tricycle
bus
motor
others

The YOLOv4 files are in the linked repository, the configuration file is in data/darknet/yolov4_visdrone.cfg, classes file is in data/classes/visdrone.names, and calculated sizes of anchor boxes are in data/anchors/visdrone_anchors.txt.

Step 2: Edge devices setup

System setup.

For evaluation purposes, the edge devices like NVIDIA Jetson Xavier NX and Intel Neural Compute Stick 2 were used. Jetson Xavier NX was flashed with JetPack SDK 4.4.1 and the SD card for Raspberry Pi 4B was flashed with Raspberry Pi OS Lite 5.10. The full TensorFlow library was utilized for Raspberry Pi 4B, the build instruction for the 2.2.0 version is available here.

Camera setup.

Due to the utilized driver for the Jetson camera (e-CAM24_CUNX – Color Global Shutter Camera), JetPack in version 4.4.1 has to be used. e-con Systems as a manufacturer provides a camera driver for NVIDIA Jetson Nano and Xavier NX and simple installation instructions.

Step 3: Edge-tailored detector model optimization and quantization

The optimization and quantization process was performed with the use of TensorRT for NVIDIA Jetson Xavier NX, OpenVINO for Intel Neural Compute Stick 2, and TensorFlow Lite for CPU-based solutions.

TensorRT framework requires converting the model to one of the supported formats like ONNX or TensorFlow. In this project, the change to ONNX format is used. To convert the script yolo_to_onnx.py available in the repository was used with the following call where -c describes a number of classes -m input model and -o output ONNX model path.

python3 yolo_to_onnx.py -c 12 -m ./yolov4-608 -o ./yolov4.onnx

To change the model from ONNX to TensorRT, the onnx_to_tensorrt.py script was used. TensorRT representation of the model is presented in three different data types: FP32 and quantized FP16 and INT8.

- convert ONNX to TensorRT engine with float32 weights

python3 onnx_to_tensorrt.py -v -c 12 -m ./yolov4 -q fp32 -o ./yolov4_fp32.trt

- convert ONNX to TensorRT engine with float16 weights

python3 onnx_to_tensorrt.py -v -c 12 -m ./yolov4 -q fp16 -o ./yolov4_fp16.trt

- convert ONNX to TensorRT engine with int8 weights (needs path to calibration dataset - representative images from the dataset, marked below as './calib_images')

python3 onnx_to_tensorrt.py -v -c 12 -m ./yolov4 -i ./calib_images -q int8 -o ./yolov4_int8.trt

TensoRT workflow

OpenVINO was used ONNX file generated with instruction from tensorrt/README.md and then with the use of OpenVINO Model Optimizer package and commands:

- FP32 data format:

python3 mo.py --input_model ./yolov4.onnx --model_name yolov4_fp32 --data_type FP32 --batch 1

- FP16 data format:

python3 mo.py --input_model ./yolov4.onnx --model_name yolov4_fp16 --data_type FP16 --batch 1

OpenVINO worflow

Conversion to TensorFlow Lite was done with the use of the ONNX model and onnx-tensorflow package by command-line interface:

onnx-tf convert -i /path/to/input.onnx -o /path/to/output

It enables change from ONNX format to TensorFlow SavedModel representation. Conversion to TF Lite is available with the use of an internal TensorFlow TFLiteConverter.

TensorFlow Lite workflow

- FP32 format

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_PATH)

tflite_model = converter.convert()

# Save the model.
with open(OUTPUT_PATH, 'wb') as f:
    f.write(tflite_model)

- FP16 format

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_PATH)

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

tflite_model = converter.convert()

# Save the model.
with open(OUTPUT_PATH, 'wb') as f:
    f.write(tflite_model)

Step 4: How to use the DeepSORT tracker

The DeepSORT algorithm takes detection results from YOLOv4 and associates them using a recursive Kalman filter and Hungarian algorithm.

DeepSORT inference pipeline

In order to run multiple object tracking algorithm, one can use object_tracker.py script as follow:object_tracker.py

python3 object_tracker.py -f trt -m ./yolov4_int8.trt -s 608 -n ./data/classes/visdrone.names -v <PATH TO INPUT VIDEO OR CAMERA> --dont_show True

The results of the object tracker script are presented in the video below. It was performed on NVIDIA Jetson Xavier NX with the use of TensorRT in FP32 mode.

DeepSORT demo video

NOTE

If one wants to run inference one NVIDIA device with TensoRT support needs to uncomment TrtYOLO detector import in detectors/__init__.py. The same issue is for utilizing OpenvinoYOLO class on Intel hardware.

Command-line arguments

Usage: object_tracker.py [OPTIONS]

Options:
-f, --framework TEXT      Inference framework: {tf, tflite, trt, opencv,
openvino}
-m, --model_path TEXT     Path to detection model
-n, --yolo_names TEXT     Path to YOLO class names file
-s, --size INTEGER        Model input size
-v, --video_path TEXT     Path to input video
-o, --output TEXT         Path to output, inferenced video
--output_format TEXT      Codec used in VideoWriter when saving video to
file
--tiny BOOLEAN            If YOLO tiny architecture
--model_type TEXT         yolov3 or yolov4
--iou FLOAT               IoU threshold
--score_threshold FLOAT   Confidence score threshold
--opencv_dnn_target TEXT  Precision of OpenCV DNN model
--device TEXT             OpenVINO inference device, available: {MYRIAD,
CPU, GPU}
--dont_show BOOLEAN       Do not show video output
--info BOOLEAN            Show detailed info of tracked objects
--count BOOLEAN           Count objects being tracked on screen
--help                    Show this message and exit.

Step 5: Performance tests

The benchmark tests were performed on NVIDIA Jetson Xavier NX and Intel Neural Compute Stick 2. Jetson Xavier NX was in mode 2 (sudo nvpmodel -m 2) and fan, clocks were set to the maximum frequency with sudo jetson clocks --fan command. To evaluate Intel INCS 2 Raspberry Pi 4B was used. The evaluation results are listed below.

Evaluation results of DeepSORT based on YOLOv4 detections

Step 6: Power consumption tests

During the performance evaluation, the energy efficiency of benchmarked edge devices was checked. The power consumption of Intel Neural Compute Stick 2 and Raspberry Pi 4B was measured with the use of a USB multifunction tester as is shown in the figures below.

Intel NCS2 power consumption measurements

Raspberry Pi 4B with NCS2 power consumption measurement

Jetson Xavier NX energy usage was inspected with the use of the jetson-stats package which is intended for monitoring and control NVIDIA Jetson devices. The results of the performed tests are presented in the table below.

Measured power consumption

For comparison, graphics cards used for inferencing and computing in cloud centers like NVIDIA V100 or RTX 3080 have a power consumption at the level of 300 and 320 Watt as follows.