The Object of My Detection
Some clever tricks led to the creation of object detection models that are more efficient than existing methods, while maintaining accuracy.
It is hard to dispute that object detection is one of the most important techniques in the entire field of computer vision. Without good algorithms to handle object detection, self-driving cars, face detection, video surveillance, crowd counting, and a whole host of other very important applications might become either impossible to implement or impractical for real world use. Fortunately, object detection algorithms are a very active area of research in machine learning, with some very useful models having been developed. YOLO and MobileNet-SSD, for example, have set new standards for accuracy in object detection.
Accuracy improvements tend to get the lion’s share of attention by those working to advance the state of the art. And that is understandable — after all, what good is a model that cannot tell a stop sign from a green light? But if a model requires a lot of computational resources when running inferences, it could be too expensive, or otherwise impractical, to implement for many use cases. A group of researchers at the Hefei Institutes of Physical Science has recently published their work that seeks to optimize object detection algorithms for better performance. Their object detector, dubbed M2YOLOF, makes use of some clever methods to improve performance, while also maintaining high levels of accuracy.
During the course of their work, the team noted that existing deep learning-based object detection methods incur very high computational costs due to repeated feature extraction and fusion of deep network structures. The new framework that they proposed uses multi-input single-output object recognition, as opposed to the typical multi-input and multi-output models now common. This shift in approach reduced model complexity, and with that, inference times were also sped up.
A trio of new techniques to extract hot spot feature information more efficiently were also introduced in this research. These techniques include a receptive field adjustment mechanism, residual attention self-learning mechanism, and an eRF-based dynamic balance sampling approach. These improvements served to boost both accuracy and efficiency above and beyond what the multi-input single-output framework alone could do.
When compared with, YOLOF, an object detector designed to be more performant than existing models, M2YOLOF was found to be similar in terms of efficiency, but it yielded more accurate results. The new technique was benchmarked against COCO data using a ResNet50 backbone, and it was discovered that the team’s approach achieved a 2.6% higher average precision level than did YOLOF. This validation showed that the methods described do in fact improve both accuracy and efficiency of object detectors simultaneously.
The team hopes that their work will inspire other research efforts to develop even more efficient models. As efficiency increases, so does the range of use cases a method is applicable to — and when the perfect balance of accuracy and efficiency is achieved, machine learning algorithms become ubiquitous. This may not be the final step in that process, but it is one step further along towards that ultimate goal.