[2602.07849] LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge
Summary
The paper presents LQA, a lightweight quantized-adaptive framework designed to enhance the deployment of Vision-Language Models (VLMs) on edge devices, addressing resource constraints and performance issues.
Why It Matters
As edge computing becomes increasingly vital for AI applications, optimizing Vision-Language Models for resource-constrained environments is essential. LQA provides a practical solution that balances efficiency and performance, making advanced AI more accessible on everyday devices.
Key Takeaways
- LQA combines modality-aware quantization with gradient-free test-time adaptation.
- The framework improves adaptation performance by 4.5% while reducing memory usage significantly.
- LQA outperforms traditional gradient-based methods, achieving up to 19.9x lower memory usage.
- The approach is designed for robust and efficient deployment of VLMs on edge devices.
- LQA supports privacy-preserving AI applications by minimizing resource demands.
Computer Science > Artificial Intelligence arXiv:2602.07849 (cs) [Submitted on 8 Feb 2026 (v1), last revised 16 Feb 2026 (this version, v2)] Title:LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge Authors:Xin Wang, Hualin Zhou, Sheng Guang Wang, Ting Dang, Yu Zhang, Hong Jia, Tao Gu View a PDF of the paper titled LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge, by Xin Wang and 6 other authors View PDF HTML (experimental) Abstract:Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can counteract such shifts, existing methods are too resource-intensive for on-device deployment. To address this challenge, we propose LQA, a lightweight, quantized-adaptive framework for VLMs that combines a modality-aware quantization strategy with gradient-free test-time adaptation. We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware. Experiments across both synthetic and real-world distribution shifts show that LQA improves overall adaptation performance by 4.5\%, uses less memory than full-precision models, and significantly outperforms gradient-based TTA methods, achieving up to 19.9$\times$ lower memory usage across seven open-source datasets. These results dem...