Brain-Machine Interface Utilizing Artificial Intelligence to Aid Patients in Speech Production

2024-08-26

Brain-computer interfaces (BCIs) are groundbreaking technologies that help paralyzed individuals regain lost functions, such as moving their arms. These devices record signals from the brain and decode the user's intended actions, bypassing damaged or degenerated nerves that would normally transmit the brain signals to muscles for movement.


Since 2006, demonstrations of human BCIs have primarily focused on restoring arm and hand movements by enabling individuals to control computer cursors or robotic arms. Recently, researchers have started developing speech BCIs to restore communication abilities in individuals who are unable to speak.




When users attempt to speak, these BCIs record unique brain signals associated with the muscle movements involved in speech production and then translate them into text. This text can then be displayed on a screen or read aloud using text-to-speech software.


Researchers at the Neural Prosthetics Laboratory at the University of California, Davis, demonstrated a speech BCI that decoded the attempted speech of a male individual with amyotrophic lateral sclerosis (ALS), also known as Lou Gehrig's disease. The interface converted neural signals into text with an accuracy exceeding 97%. The key to this system lies in a set of artificial intelligence language models - artificial neural networks - that help interpret the natural neural networks.


Recording Brain Signals


Our first step in developing a speech BCI is to record brain signals. Brain signals have multiple sources, some of which require surgical implantation for recording. Implantable recording devices capture high-quality brain signals because they are placed closer to the neurons, resulting in stronger and less noisy signals. These neural recording devices include electrode grids placed on the surface of the brain or electrodes directly implanted into brain tissue.


In our study, we surgically implanted an electrode array in the speech motor cortex of our subject, Casey Harrell, which controls the muscles involved in speech. We recorded neural activity from 256 electrodes while Harrell attempted to speak.


Decoding Brain Signals


The next challenge is to associate complex brain signals with the words the user is trying to say.


One approach is to directly map neural activity patterns to spoken words. This approach requires multiple recordings of brain signals corresponding to each word to determine the average relationship between neural activity and specific words. While this approach performs well with small vocabularies, as demonstrated in a 2021 study with 50 words, it becomes impractical for larger vocabularies. Imagine requiring BCI users to attempt saying every word in a dictionary - this would take months and still be ineffective for new words.


Instead, we adopted an alternative strategy: mapping brain signals to phonemes, the basic sound units that make up words. In English, there are 39 phonemes, including ch, er, oo, pl, and sh, which can be combined to form any word. We only need the subject to read aloud a few sentences to measure neural activity associated with each phoneme multiple times. By accurately mapping neural activity to phonemes, we can combine them to form any English word, even those the system has not been explicitly trained on.


To map brain signals to phonemes, we utilized advanced machine learning models. These models are particularly suited for this task as they can find patterns in complex data that humans cannot discern. Think of these models as super-intelligent listeners that can pick out important information from noisy brain signals, much like focusing on a conversation in a crowded room. Using these models, we achieved over 90% accuracy in decoding phoneme sequences during speech attempts.


From Phonemes to Words


Once we obtained the decoded phoneme sequences, we need to convert them into words and sentences. This is a challenging task, especially when the decoded phoneme sequences are not entirely accurate. To address this challenge, we employed two complementary machine learning language models.


The first is an n-gram language model, which predicts the most likely word to follow a given sequence of n words. We trained a 5-gram language model, based on millions of sentences, to predict the likelihood of a word appearing based on the preceding four words, capturing local context and common phrases. For example, after "I am very good," it might suggest "today" is more likely to appear than "potato." Using this model, we convert the phoneme sequence into 100 most probable word sequences, each with an associated probability.


The second is a large-scale language model that supports AI chatbots and predicts which words are most likely to follow other words. We utilize these large-scale language models to optimize our selection. Trained on diverse texts, these models have a broader understanding of language structure and meaning. They help us determine which of our 100 candidate sentences is most meaningful in a broader context.


By carefully balancing the probabilities from the n-gram language model, the large-scale language model, and our initial phoneme predictions, we can make highly reasonable guesses about the content the BCI user intends to express. This multi-step process allows us to handle uncertainties in phoneme decoding and generate coherent and contextually appropriate sentences.


Real-World Benefits


In practice, this speech decoding strategy has achieved significant success. We assisted Casey Harrell, who has ALS, in achieving "speech" with over 97% accuracy solely through his thoughts. This breakthrough has allowed him to effortlessly communicate with his family and friends for the first time in years, all from the comfort of his home.


Speech BCIs have taken an important step in restoring communication. As we continue to improve these devices, they hold the potential to give a voice to individuals who have lost the ability to speak, reconnecting them with loved ones and the world around them.


However, challenges remain, such as making this technology more accessible, portable, and durable over years of use. Despite these obstacles, speech BCIs serve as a powerful example of how science and technology can collaborate to address complex problems and significantly improve people's lives.