I’m Just Thinking Out Loud

A deep learning-based approach non-invasively translates brain activity into speech, giving a voice to those that are unable to speak.

DeWave translates thoughts into speech (📷: University of Technology Sydney)

For individuals who are unable to speak due to illness or injury, such as stroke or paralysis, assistive technologies play a pivotal role in restoring communication and enhancing their quality of life. The inability to speak can arise from various conditions, including neurological disorders, traumatic injuries, or degenerative diseases, presenting significant challenges for those affected. Communication is a fundamental aspect of human interaction, and losing the ability to speak can lead to isolation, frustration, and a diminished sense of autonomy.

According to the World Health Organization, approximately 15 million people worldwide suffer a stroke each year, and a significant proportion of survivors experience communication difficulties, including speech impairment. Furthermore, conditions like amyotrophic lateral sclerosis, spinal cord injuries, and certain types of paralysis also contribute to a substantial number of individuals facing speech-related challenges. The impact of speech impairment extends beyond verbal communication, affecting social interactions, relationships, and even employment opportunities for those individuals.

To address these challenges, assistive technologies have emerged as crucial tools. For example, augmentative and alternative communication devices allow people to express themselves using text, symbols, or synthesized speech. Eye-tracking technology enables users to control devices using eye movements, allowing those with limited mobility to navigate communication devices and express their thoughts. Speech-generating devices use sophisticated algorithms to convert typed text or selected symbols into audible speech, providing a voice for those who cannot speak independently.

Demonstrating the system (📷: University of Technology Sydney)

Unfortunately, these existing options can be slow and cumbersome, hindering normal communication. Furthermore, due to their illness or injury, many individuals are incapable of using many of these technologies. Researchers at the University of Technology Sydney have recently developed a new type of assistive technology that is much faster and more natural to operate, and that is accessible to nearly everyone. Their system non-invasively measures brain activity and translates those signals into synthetic speech.

Brain-computer interfaces of this sort frequently require surgical implantation of probes into the brain for operation. Needless to say, this is undesirable and will deter many people from using the technology. In this new work, the team has instead leveraged a non-invasive method of measuring brain signals — an electroencephalogram (EEG) cap. This offers a frictionless solution for translating thought into speech.

The quality of signals captured by an EEG is much poorer than those acquired by implanted probes, however. To overcome this problem, a deep learning model called DeWave was developed. DeWave translates raw EEG waves into words and sentences. This was made possible by training the model on large quantities of EEG data that was paired with the corresponding words or phrases that were being spoken at the time of the recording.

It is well known that EEG signals can differ significantly between individuals, even as they silently speak the same words. As such, this type of system can be challenging to implement in a generalized way that will work for a wide range of individuals. In this work, a study was carried out that involved 29 participants, demonstrating that it is more robust than many existing technologies, likely as a result of the large training set used in developing DeWave.

Despite the team’s successes, there is still much work to be done. As it presently stands, the translation accuracy of the interface is about 40%. In order to truly provide clear, natural communication capabilities, the level of accuracy will need to be improved significantly. It was also noted that DeWave can translate verbs much easier than it can translate nouns, indicating that there are additional areas in need of improvement. But with some enhancements, these techniques could one day not only give a voice to the voiceless, but also enhance communication between humans and machines.

machine learning

artificial intelligence

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

I’m Just Thinking Out Loud

A deep learning-based approach non-invasively translates brain activity into speech, giving a voice to those that are unable to speak.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles