Facing the Problem Head-on
By leveraging a depth camera and gyroscopic sensor, this deep learning-based system can accurately recognize the orientation of a face.
Face orientation estimation in machine learning is a critical part of computer vision and human-computer interaction, with a wide range of practical applications. One of the most well-known applications of face orientation estimation is in the field of driver monitoring systems, which are designed to improve road safety. Machine learning models can analyze a driver's face orientation in real-time to determine if they are paying attention to the road or if their attention is diverted, such as by texting or dozing off. These systems can issue alerts or activate safety mechanisms to reduce the risk of accidents.
Face orientation estimation is also applicable in other fields beyond road safety. In human-computer interaction, it can be used for gaze tracking and facial expression analysis, which can enable more immersive and responsive virtual reality experiences, as well as improve user interactions with devices and software. In the healthcare sector, it can help assess neurological conditions by monitoring head orientation and facial movements.
But to enable these applications, improvements in existing face orientation estimation systems are needed. Traditionally, this task relied on recognizing the characteristic features of the face, such as the nose, eyes, and mouth, and detecting their movements to infer the orientation. However, these traditional methods have limitations, such as privacy concerns and the potential to fail when individuals wear masks or when their heads are in unexpected orientations.
These shortcomings have led researchers to experiment with using point cloud data obtained from a depth sensor to estimate face orientation. While these efforts have shown much promise, they tend to be limited to recognizing only a few possible face orientations. This is hardly acceptable for safety-critical applications, however.
In response, a pair of researchers at the Shibaura Institute of Technology in Japan have developed a deep learning-based approach to the problem. By integrating an additional sensor into the model training process, they found that they could accurately identify any facial orientation from point cloud data. Moreover, their techniques allowed them to do this using only a small set of training data.
The team leveraged a 3D depth camera, like previous methods, but also included gyroscopic sensors during the training process. As data was collected, the point clouds from the depth camera were paired with precise information on face orientation obtained from a gyroscopic sensor strapped to the back of the head. This provided an accurate, consistent measure of the head’s horizontal angle of rotation.
By collecting data from a wide range of angles, the researchers found that they could train an accurate model that could recognize more positions than the usual five or so that traditional methods can detect. With more data, the system could conceivably learn to recognize any possible head orientation. And thanks to the precise information provided by the gyroscopic sensor, only a relative handful of samples are required to achieve that result.
During normal operation, only the depth camera is needed. These cameras do not capture images like traditional cameras, so will preserve the privacy of the individual being observed. Moreover, the point cloud data they capture can still be used to determine facial orientation even if the individual is wearing a mask or in a dark environment.
Looking ahead, the team is planning to improve the accuracy and efficiency of their system. They hope to be able to prove that it can run on small, resource-constrained devices, which would allow it to power real-world applications.