[2511.21740] A cross-species neural foundation model for end-to-end speech decoding
About this article
Abstract page for arXiv paper 2511.21740: A cross-species neural foundation model for end-to-end speech decoding
Computer Science > Computation and Language arXiv:2511.21740 (cs) [Submitted on 21 Nov 2025 (v1), last revised 28 Feb 2026 (this version, v3)] Title:A cross-species neural foundation model for end-to-end speech decoding Authors:Yizi Zhang, Linyang He, Chaofei Fan, Tingkai Liu, Han Yu, Trung Le, Jingyuan Li, Scott Linderman, Lea Duncker, Francis R Willett, Nima Mesgarani, Liam Paninski View a PDF of the paper titled A cross-species neural foundation model for end-to-end speech decoding, by Yizi Zhang and 11 other authors View PDF HTML (experimental) Abstract:Speech brain-computer interfaces (BCIs) aim to restore communication for people with paralysis by translating neural activity into text. Most systems use cascaded frameworks that decode phonemes before assembling sentences with an n-gram language model (LM), preventing joint optimization of all stages simultaneously. Here, we introduce an end-to-end Brain-to-Text (BIT) framework that translates neural activity into coherent sentences using a single differentiable neural network. Central to our approach is a cross-task, cross-species pretrained neural encoder, whose representations transfer to both attempted and imagined speech. In a cascaded setting with an n-gram LM, the pretrained encoder establishes a new state-of-the-art (SOTA) on the Brain-to-Text '24 and '25 benchmarks. Integrated end-to-end with audio large language models (LLMs) and trained with contrastive learning for cross-modal alignment, BIT reduces the wor...