Thinking Beyond the Touchscreen
Acoustic+Pose leverages Acoustic Surface technology and machine learning to recognize hand gestures performed near a smartphone's screen.
Combining hand gesture inputs with traditional touchscreen interactions has the potential to enhance the user experience in the realm of smartphone technology. This would provide a more seamless and intuitive way to interact with devices. Hand gesture inputs can be used for a variety of tasks, from simple ones like navigating menus and apps to more complex ones like controlling media playback or taking photos. By using intuitive hand gestures, users can quickly switch between apps, scroll through web pages, or zoom in and out on images, making smartphone use faster and more efficient overall.
One of the most significant advantages of hand gesture inputs over touchscreens is that they reduce the need for physical contact, allowing users to interact with their devices in situations where touching the screen is not possible, such as when wearing gloves, cooking, or when their hands are dirty. This feature can also be particularly beneficial in situations where it is important to keep the screen surface clean, such as in medical settings or when participating in activities that involve exposure to harsh elements.
Most techniques for recognizing hand gestures using an unmodified, commercial smartphone rely on the smartphone's speaker to emit acoustic signals, which are then reflected back to the microphone for interpretation by a machine learning algorithm. However, because the hardware was not originally designed for this purpose, the positioning of the speaker and microphone is not ideal. As a result, these systems can generally detect hand movements but have difficulty recognizing static hand gestures.
A pair of engineers at the Tokyo University of Technology and Yahoo Japan Corporation believe that the ability to detect static hand gestures could unlock many new possibilities and efficiencies. They have developed a system called Acoustic+Pose that, instead of the standard speaker, leverages the Acoustic Surface technology available on some smartphone models. Acoustic Surface vibrates the entire surface of a smartphone’s screen to radiate acoustic signals much more widely and powerfully.
Acoustic+Pose was built to detect static hand poses at ranges of a few inches from the screen. Inaudible acoustic signals are propagated throughout the case of the phone using the Acoustic Surface technology. When these radiated waves come into contact with a hand in front of the screen, they are modulated in distinct ways as they are reflected back in the direction of the phone, where they are captured by a microphone. This information was interpreted by a number of machine learning models, and it was determined that a random forest algorithm performed with the highest level of accuracy.
A small study of eleven participants was conducted to assess the real-world performance of Acoustic+Pose. The algorithm was first trained to recognize ten different static hand poses. Then, each participant was asked to perform each hand pose for a period of 1.5 seconds. The team found that their system could accurately identify those hand poses with an average accuracy of 90.2%.
In a series of demonstrations, it was shown how Acoustic+Pose could be used to, for example, perform file operations on a smartphone that would otherwise require interacting with small icons or long-pressing on the screen. It was also demonstrated that hand poses could be used to interact with a map application, performing operations like zooms.
Acoustic Surface is still an emerging technology that is not available on most smartphone models, so the future utility of Acoustic+Pose is heavily reliant on its ultimate widespread adoption, which is far from a certainty. But the team is improving their system and making it more robust in case that future becomes a reality.