[2512.08829] InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

[2512.08829] InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2512.08829: InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2512.08829 (cs) [Submitted on 9 Dec 2025 (v1), last revised 31 Mar 2026 (this version, v2)] Title:InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Authors:Hongyuan Tao, Bencheng Liao, Shaoyu Chen, Haoran Yin, Qian Zhang, Wenyu Liu, Xinggang Wang View a PDF of the paper titled InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models, by Hongyuan Tao and 6 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) are increasingly tasked with ultra-long multimodal understanding. While linear architectures offer constant computation and memory footprints, they often struggle with high-frequency visual perception compared to standard Transformers. To bridge this gap, we introduce \textbf{InfiniteVL}. We first develop a hybrid base model called \textbf{InfiniteVL-Base} that interleaves a small fraction of Full Attention layers with Gated DeltaNet. Empowered by a tailored distillation and fine-tuning strategy, InfiniteVL-Base matches the fundamental multimodal performance of equivalent Transformers while achieving a \textbf{1.7$\times$} decoding speedup. However, the quadratic complexity of the retained Full Attention inevitably becomes an efficiency bottleneck when scaling to ultra long context. To break this barrier, we propose a novel Long-Sequence Architectural Fine-Tuni...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

Llms

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

So, I was trying to build a simple AI when I thought of, 'How could I give an AI some emotions? ' This led to one thing after another, an...

Reddit - Machine Learning · 1 min ·
Llms

Open-source list of GenAI-related incidents

I am sharing this open-source list of cases where the ethics of GenAI use were put in the spotlight, in the hopes of sparking discussion ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch — contributions welcome! [P]

Hey everyone, I've been working on a repo where I implement large language model architectures using the simplest PyTorch code possible. ...

Reddit - Machine Learning · 1 min ·
Llms

I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch — contributions welcome! [P]

Hey everyone, I've been working on a repo where I implement large language model architectures using the simplest PyTorch code possible. ...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime