[2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

[2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

arXiv - AI 3 min read Article

Summary

The paper introduces ViT-Linearizer, a framework that distills knowledge from Vision Transformers (ViTs) into efficient linear-time models, addressing the challenges of quadratic complexity in high-resolution vision tasks.

Why It Matters

As computer vision models become increasingly complex, the quadratic scaling of ViTs poses significant challenges for real-world applications. ViT-Linearizer offers a solution by enabling faster inference without sacrificing performance, making advanced vision models more accessible and practical for various applications.

Key Takeaways

  • ViT-Linearizer distills knowledge from ViTs to linear-time models.
  • The framework uses activation matching and masked prediction for effective distillation.
  • It significantly improves inference speed for high-resolution tasks.
  • Achieves competitive performance on ImageNet with 84.3% top-1 accuracy.
  • Bridges theoretical efficiency with practical applications in large-scale visual tasks.

Computer Science > Computer Vision and Pattern Recognition arXiv:2504.00037 (cs) [Submitted on 30 Mar 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models Authors:Guoyizhe Wei, Rama Chellappa View a PDF of the paper titled ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models, by Guoyizhe Wei and 1 other authors View PDF HTML (experimental) Abstract:Vision Transformers (ViTs) have delivered remarkable progress through global self-attention, yet their quadratic complexity can become prohibitive for high-resolution inputs. In this work, we present ViT-Linearizer, a cross-architecture distillation framework that transfers rich ViT representations into a linear-time, recurrent-style model. Our approach leverages 1) activation matching, an intermediate constraint that encourages student to align its token-wise dependencies with those produced by the teacher, and 2) masked prediction, a contextual reconstruction objective that requires the student to predict the teacher's representations for unseen (masked) tokens, to effectively distill the quadratic self-attention knowledge into the student while maintaining efficient complexity. Empirically, our method provides notable speedups particularly for high-resolution tasks, significantly addressing the hardware challenges in inference. Additionally, it also elevates Mamba-based architectures' performance on standard vision ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[D] Looking for definition of open-world ish learning problem

Hello! Recently I did a project where I initially had around 30 target classes. But at inference, the model had to be able to handle a lo...

Reddit - Machine Learning · 1 min ·
Mystery Shopping Meets Machine Learning: Can Algorithms Become the Ultimate Customer Experience Auditor?
Machine Learning

Mystery Shopping Meets Machine Learning: Can Algorithms Become the Ultimate Customer Experience Auditor?

Customer expectations across Africa are shifting faster than most organisations can track. A single inconsistent interaction can ignite a...

AI News - General · 8 min ·
Machine Learning

GitHub to Use User Data for AI Training by Default

submitted by /u/i-drake [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime