[2602.20309] QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

[2602.20309] QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

arXiv - Machine Learning 4 min read Article

Summary

QuantVLA introduces a novel post-training quantization framework for Vision-Language-Action models, enhancing efficiency without additional training.

Why It Matters

As AI models grow in complexity, their deployment becomes increasingly resource-intensive. QuantVLA addresses these challenges by enabling efficient model quantization, which is crucial for practical applications in embodied intelligence, especially under stringent compute and memory constraints.

Key Takeaways

  • QuantVLA is the first post-training quantization method for Vision-Language-Action models.
  • It achieves significant memory savings (about 70%) and improves inference speed (1.22x) without requiring additional training.
  • The framework utilizes a small unlabeled calibration buffer for efficient quantization.

Computer Science > Machine Learning arXiv:2602.20309 (cs) [Submitted on 23 Feb 2026] Title:QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models Authors:Jingxuan Zhang, Yunta Hsieh, Zhongwei Wang, Haokun Lin, Xin Wang, Ziqi Wang, Yingtie Lei, Mi Zhang View a PDF of the paper titled QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models, by Jingxuan Zhang and 6 other authors View PDF HTML (experimental) Abstract:Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenecks, we introduce QuantVLA, a training-free post-training quantization (PTQ) framework that, to our knowledge, is the first PTQ approach for VLA systems and the first to successfully quantize a diffusion transformer (DiT) action head. QuantVLA incorporates three scale-calibrated components: (1) a selective quantization layout that integerizes all linear layers in both the language backbone and the DiT while keeping attention projections in floating point to preserve the original operator schedule; (2) attention temperature matching, a lightweight per-head scaling mechanism that stabilizes attention logits and is folded into the dequantization scales at inference; and (3) output head balancing, a per-layer r...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Using machine learning to identify individuals at risk for intimate partner violence
Machine Learning

Using machine learning to identify individuals at risk for intimate partner violence

Researchers at Mass General Brigham have developed a series of artificial intelligence (AI) tools that uses machine learning to identify ...

AI News - General · 7 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime