This project is was made as a submission for Gen AI Intensive Course Capstone 2025Q1. I'd really appreciate it if you could upvote and drop a comment on my Kaggle project linked here. Thank you!
People with severe motor disabilities—like those with ALS, Locked-in Syndrome, or neurodegenerative conditions—often struggle to communicate due to limited voluntary movement and speech. While assistive technologies like eye-tracking, muscle-triggered switches, and speech devices exist, they tend to be slow, inflexible, and lack contextual awareness.
Brain-computer interfaces (BCIs), such as P300 spellers, offer a non-invasive alternative, but they still face challenges like low throughput and limited adaptability. Most don’t account for the user’s environment or intent beyond basic signal detection.
There’s a clear need for a smarter, more integrated system—one that can interpret multiple input signals (like EEG, eye movement, or muscle activity) and generate context-aware, meaningful text in real-time. With recent advances in large language models (LLMs), we now have the tools to bridge that gap.
This project explores how non-invasive EEG signals, when combined with contextual data and LLMs, can power a more adaptive and intelligent communication interface.
Reading Brainwaves with an EEG MachineI have an EEG headset from a Star Wars Force Trainer II, but any Neurosky ThinkGear compatible EEG headset should work.
This headset has three electrodes, one sitting on your forehead and two that go right behind your ear.
The device outputs the 5 major brainwave frequencies, a normalized "attention" and "meditation" values, as well as the raw EEG potential readings.
Blinking/eyelid movement elicits a large potential spike that is picked up by the electrodes, as you can see in this GIF.
Usually this is an wanted artifact that's filtered out, but in our case this is actually what we used as our control signal.
Brain-to-Text InputUsing the blink detection into a text input is actually quite simple, using a speller board/speller matrix.
A speller board is a common interface used in brain-computer and assistive communication systems to help users type without physical input. It typically displays a grid of characters (like letters or words), and the system highlights rows and columns in a pattern.
The user selects their desired character by responding— often with a detectable signal like a blink, eye movement, or a brainwave pattern— when the correct row or column flashes. The system then combines the selected row and column to determine the intended character.
In this project, blinks are used as the trigger to confirm a selection, acting as a signal of user intent.
Although we can simply use the inputted text from the user to generate auto-complete suggestions, there is a high chance we get irrelevant or incorrect completions, which defeats the purpose of minimizing cognitive and physical effort.
However, by capturing what the user sees and hears, we can use the conversation and visual setting to better inform the LLM, before generating suggestions.
The LLM can use the surrounding context to generate highly relevant, natural language suggestions that greatly reduces the user's effort to communicate their intentions.
This is done quite simply, using a body mounted camera to periodically take pictures, and microphone to capture conversational audio.
The images and audio are sent to LLM to generate descriptive text and transcriptions of conversations.
As an example, if we feed in the image below:
The LLM might generate below description of the scene:
This is a well-lit physiotherapy room with various exercise and rehabilitation equipment. A parallel bar setup dominates the center, suggesting gait training. Exercise balls and resistance machines are present, indicating a focus on strength and balance exercises. The setting implies a healthcare environment geared toward physical recovery and therapy.Putting it all together
Now that we have all our key pieces implemented, we can put them together.
The user inputted text, along with the environmental context, is fed to the LLM, and it generates highly relevant suggestions to auto-complete the user input.
The LLM follows a strict structured JSON output, containing:
- suggestion: the suggested phrase/sentence based on user input and context
- reasoning: the line of reasoning the LLM took to create the suggestion. The LLM is prompted to follow a different line of reasoning to ensure diverse outputs.
- contextual_relevance_score: a self-evaluated score between 1-5, indicating how relevant the suggestion is to the context
- confidence_score: a self-evaluated score between 1-5, indicating how confident the LLM is that the suggestion represents the user's intent
Another control signal from EEG headset will signal which option the user selected, which enables the user to effectively communicate their intention with minimal input.
As a whole, this system enables individuals with severe motor or speech impairments to express themselves more clearly, more quickly, and more naturally.
Live Demo
Comments
Please log in or sign up to comment.