[2602.22555] Autoregressive Visual Decoding from EEG Signals
Summary
The paper presents AVDE, a novel framework for decoding visual information from EEG signals, addressing challenges in modality bridging and computational efficiency.
Why It Matters
This research is significant as it proposes a more efficient method for visual decoding from EEG signals, which could enhance brain-computer interface (BCI) applications. By reducing computational overhead and improving accuracy, it opens new avenues for real-time applications in neuroscience and assistive technologies.
Key Takeaways
- AVDE framework improves visual decoding from EEG signals.
- Utilizes contrastive learning to align EEG and image representations.
- Employs an autoregressive generative model for efficient image reconstruction.
- Outperforms existing methods while using only 10% of the parameters.
- Demonstrates the hierarchical nature of human visual perception in generative processes.
Computer Science > Machine Learning arXiv:2602.22555 (cs) [Submitted on 26 Feb 2026] Title:Autoregressive Visual Decoding from EEG Signals Authors:Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye View a PDF of the paper titled Autoregressive Visual Decoding from EEG Signals, by Sicheng Dai and 3 other authors View PDF HTML (experimental) Abstract:Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency and manage compounding errors. Furthermore, the computational overhead imposed by large-scale diffusion models limit their practicality in real-world brain-computer interface (BCI) applications. In this work, we present AVDE, a lightweight and efficient framework for visual decoding from EEG signals. First, we leverage LaBraM, a pre-trained EEG model, and fine-tune it via contrastive learning to align EEG and image representations. Second, we adopt an autoregressive generative framework based on a "next-scale prediction" strategy: images are encoded into multi-scale token maps using a pre-trained VQ-VAE, and a transformer is trained to autoregressively predict finer-scale tokens starting from EEG embeddings as the coarsest representation. This...