Reading brain activity with advanced technologies is not a new concept. However, most techniques have focused on identifying single words associated with an object or action a person is seeing or thinking of, or matching up brain signals that correspond to spoken words. Some methods used caption databases or deep neural networks, but these approaches were limited by database word coverage or introduced information not present in the brain. Generating detailed, structured descriptions of complex visual perceptions or thoughts remains difficult.
A study, recently published in Science Advances, takes a new approach. Researchers involved in the study have developed what they refer to as a “mind-captioning” technique that uses an iterative optimization process, where a masked language model (MLM) generates text descriptions by aligning text features with brain-decoded features.
The technique also incorporates linear models trained to decode semantic features from a deep language model using brain activity from functional magnetic resonance imaging (fMRI). The result is a detailed text description of what a participant is seeing in their brain.
Generating video captions from human perception
For the first part of the experiment, six people watched 2,196 short videos while their brain activity was scanned with fMRI. The videos featured various random objects, scenes, actions, and events, and the six subjects were native Japanese speakers and non-native English speakers.
The same videos previously underwent a kind of crowdsourced text captioning by other viewers, which was processed by a pretrained LM, called DeBERTa-large that extracted particular features. These features were matched to brain activity and text was generated through an iterative process by the MLM model, called RoBERTa-large.
“Initially, the descriptions were fragmented and lacked clear meaning. However, through iterative optimization, these descriptions naturally evolved to have a coherent structure and effectively capture the key aspects of the viewed videos. Notably, the resultant descriptions accurately reflected the content, including the dynamic changes in the viewed events. Furthermore, even when specific objects were not correctly identified, the descriptions still successfully conveyed the presence of interactions among multiple objects,” the study authors explain.
The team then compared the generated descriptions to both correct and incorrect captions across various numbers of candidates to determine accuracy, which they say was around 50%. They note that this level of accuracy surpasses other current approaches and holds promise for future improvement.
Reading memories
The same six participants were later asked to recall the videos under fMRI to test out the method’s ability to read memory, instead of visual experience. The results for this part of the experiment were also promising.
“The analysis successfully generated descriptions that accurately reflected the content of the recalled videos, although accuracy varied among individuals. These descriptions were more similar to the captions of the recalled videos than to irrelevant ones, with proficient subjects achieving nearly 40% accuracy in identifying recalled videos from 100 candidates,” the study authors write.
For people who have a diminished or lost capacity to speak, such as those who have had a stroke, this new technology could eventually serve as a way to restore communication. The fact that the system has proven itself capable of picking up on deeper meanings and relationships, instead of simple word associations, could allow these individuals to regain much more of their communication ability than some of the other brain-computer interface methods. Still, further optimization is necessary before getting to that point.
Ethical considerations and future directions
Regardless of some of the more positive applications for mind-captioning devices capable of reading human thought, there are certainly legitimate concerns regarding privacy and potential misuse of brain-to-text technology.
The researchers involved in the study note that consent will remain a major ethical consideration when employing mind-reading techniques. Before more widespread use of these technologies is common, important questions about mental privacy and the future of brain-computer interfaces will need to be addressed.
Still, the study offers up a new tool for scientific research into how the brain represents complex experiences and a potential boon for nonverbal individuals.
The study authors write, “Together, our approach balances interpretability, generalizability, and performance—establishing a transparent framework for decoding nonverbal thought into language and paving the way for systematic investigation of how structured semantics are encoded across the human brain.”
Written for you by our author Krystal Kasal, edited by Lisa Lock, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.
More information:
Tomoyasu Horikawa, Mind captioning: Evolving descriptive text of mental content from human brain activity, Science Advances (2025). DOI: 10.1126/sciadv.adw1464
© 2025 Science X Network
Citation:
‘Mind-captioning’ technique can read human thoughts from brain scans (2025, November 8)
retrieved 8 November 2025
from https://medicalxpress.com/news/2025-11-mind-captioning-technique-human-thoughts.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
