A Clever Neural Network Lets You Decide Exactly What Noises These Headphones Cancel

Running a novel neural network on a connected smartphone, these "semantic hearing" headphones can isolate individual sound types on demand.

Researchers at the University of Washington, working with Microsoft, have come up with the concept of noise-canceling headphones with "semantic hearing" capabilities powered by machine learning — allowing the wearer to decide what noises they would like to hear while cancelling everything else.

"Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today's noise canceling headphones haven’t achieved," explains senior author Shyam Gollakota of the problem the team set out to solve. "The challenge is that the sounds headphone wearers hear need to sync with their visual senses. You can't be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second."

These aren't your usual noise-canceling headphones: these let you pick exactly what you want to hear. (📹: Veluri et al)

The speed issue aside, the idea is disarmingly simple: rather than canceling out all incoming sounds, or selected frequencies, the prototype system classifies incoming sounds and allows the user to decide what they would like to hear. It's a step above existing noise-canceling headphones, which at best offer a setting to pass through the frequencies used by human speech.

The prototype developed by the team certainly shows promise. The wearable was tested in scenarios including holding a conversation while a nearby vacuum cleaner runs, muting street chatter while listening to birds, removing construction sounds while still being able to hear car horns in traffic, and even canceling all noises during meditation save for an alarm clock indicating when the session is over.

The trick to processing the sound as rapidly as possible is to offload it to a more powerful device than you can cram into a pair of headphones: the user's smartphone. It's this which runs a specially-developed neural network tailored for binaural sound extraction — the first of its kind, the researchers claim.

The prototypes were tested in a variety of scenarios, including attempting to converse while someone vacuums around you. (📹: Veluri et al)

"Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56ms on a connected smartphone," the team writes. "In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output."

The researchers' work has been published in the Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23) under closed-access terms; an open-access preprint is available on Cornell's arXiv server, while samples are available on the project website. Code publication has been promised, but at the time of writing the GitHub repository was empty bar a readme file.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles