[2602.13710] HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models
Summary
The paper presents HBVLA, a framework for 1-bit post-training quantization of Vision-Language-Action models, enhancing efficiency while maintaining performance on resource-constrained devices.
Why It Matters
As AI models become increasingly complex, their deployment on limited hardware is a significant challenge. The HBVLA framework addresses this by allowing for efficient quantization without substantial performance loss, making advanced AI applications more accessible in real-world scenarios, especially in robotics.
Key Takeaways
- HBVLA improves the efficiency of Vision-Language-Action models through 1-bit quantization.
- The framework retains high performance, with quantized models achieving over 92% of full-precision performance.
- Utilizes a policy-aware enhanced Hessian to identify critical weights for action generation.
- Demonstrates robust deployability on hardware-limited platforms, crucial for robotics.
- Provides a practical foundation for ultra-low-bit quantization, expanding the usability of AI models.
Computer Science > Machine Learning arXiv:2602.13710 (cs) [Submitted on 14 Feb 2026] Title:HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models Authors:Xin Yan, Zhenglin Wan, Feiyang Ye, Xingrui Yu, Hangyu Du, Yang You, Ivor Tsang View a PDF of the paper titled HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models, by Xin Yan and 6 other authors View PDF HTML (experimental) Abstract:Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing methods fail to narrow the distribution gap between binarized and full-precision weights, causing quantization errors to accumulate under long-horizon closed-loop execution and severely degrade actions. To fill this gap, we propose HBVLA, a VLA-tailored binarization framework. First, we use a policy-aware enhanced Hessian to identify weights that are truly critical for action generation. Then, we employ a sparse orthogonal transform for non-salient weights to induce a low-entropy intermediate state. Finally, we quantize both salient and non-salient weights in the Harr domain with group-wise 1-bit quantization. We have evaluated our approach on different VLAs: on LIBERO, quantized OpenVLA-OFT retains 92.2% of full-precision performance; on Sim...