[2602.21819] SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

[2602.21819] SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

arXiv - AI 4 min read Article

Summary

The paper presents SemVideo, a novel framework that reconstructs videos from brain activity using hierarchical semantic guidance, addressing key challenges in fMRI-to-video reconstruction.

Why It Matters

Understanding how the brain processes visual information is crucial for advancements in neuroscience and AI. SemVideo's approach could enhance our ability to decode and interpret brain signals, with implications for both medical research and AI development.

Key Takeaways

  • SemVideo improves fMRI-to-video reconstruction by addressing visual representation inconsistencies and temporal coherence issues.
  • The framework utilizes hierarchical semantic cues to enhance alignment and motion adaptation during video reconstruction.
  • Experiments show SemVideo sets a new state-of-the-art in performance on established datasets.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.21819 (cs) [Submitted on 25 Feb 2026] Title:SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance Authors:Minghan Yang, Lan Yang, Ke Li, Honggang Zhang, Kaiyue Pang, Yizhe Song View a PDF of the paper titled SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance, by Minghan Yang and 5 other authors View PDF HTML (experimental) Abstract:Reconstructing dynamic visual experiences from brain activity provides a compelling avenue for exploring the neural mechanisms of human visual perception. While recent progress in fMRI-based image reconstruction has been notable, extending this success to video reconstruction remains a significant challenge. Current fMRI-to-video reconstruction approaches consistently encounter two major shortcomings: (i) inconsistent visual representations of salient objects across frames, leading to appearance mismatches; (ii) poor temporal coherence, resulting in motion misalignment or abrupt frame transitions. To address these limitations, we introduce SemVideo, a novel fMRI-to-video reconstruction framework guided by hierarchical semantic information. At the core of SemVideo is SemMiner, a hierarchical guidance module that constructs three levels of semantic cues from the original video stimulus: static anchor descriptions, motion-oriented narratives, and holistic summaries. Leveraging this semantic guidance,...

Related Articles

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min ·
Machine Learning

[HIRING]Remote AI Training Jobs -Up to $1K/Week| Collaborators Wanted.USA

submitted by /u/nortonakenga [link] [comments]

Reddit - ML Jobs · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime