[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

arXiv - Machine Learning 3 min read Article

Summary

The paper presents SLA2, an advanced Sparse-Linear Attention model that enhances video generation efficiency by introducing a learnable routing mechanism and quantization-aware fine-tuning.

Why It Matters

SLA2 addresses limitations in previous sparse-linear attention models, improving computational efficiency and maintaining high-quality outputs in video diffusion tasks. This advancement is significant for researchers and practitioners in machine learning and AI, particularly in optimizing model performance while reducing resource consumption.

Key Takeaways

  • SLA2 introduces a learnable router for dynamic attention computation selection.
  • The model achieves 97% attention sparsity with an 18.6x speedup in processing.
  • Quantization-aware fine-tuning is employed to minimize quantization errors.
  • SLA2 enhances performance in video generation tasks compared to previous models.
  • The proposed method offers a more direct formulation for combining sparse and linear attention.

Computer Science > Machine Learning arXiv:2602.12675 (cs) [Submitted on 13 Feb 2026] Title:SLA2: Sparse-Linear Attention with Learnable Routing and QAT Authors:Jintao Zhang, Haoxu Wang, Kai Jiang, Kaiwen Zheng, Youhe Jiang, Ion Stoica, Jianfei Chen, Jun Zhu, Joseph E. Gonzalez View a PDF of the paper titled SLA2: Sparse-Linear Attention with Learnable Routing and QAT, by Jintao Zhang and 8 other authors View PDF HTML (experimental) Abstract:Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can be suboptimal. Additionally, (ii) after formally analyzing the attention error in SLA, we identify a mismatch between SLA and a direct decomposition into sparse and linear attention. We propose SLA2, which introduces (I) a learnable router that dynamically selects whether each attention computation should use sparse or linear attention, (II) a more faithful and direct sparse-linear attention formulation that uses a learnable ratio to combine the sparse and linear attention branches, and (III) a sparse + low-bit attention design, where low-bit attention is introduced via quantization-aware fine-tuning to reduce quantization error. Experiments show that on video diffusion models, SLA2 can achieve 97% attention sparsity and deliver an 18.6x att...

Related Articles

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch
Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min ·
Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime