[2603.19296] TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
About this article
Abstract page for arXiv paper 2603.19296: TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly
Computer Science > Machine Learning arXiv:2603.19296 (cs) [Submitted on 11 Mar 2026] Title:TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly Authors:Toshiaki Koike-Akino, Jing Liu, Ye Wang View a PDF of the paper titled TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly, by Toshiaki Koike-Akino and 2 other authors View PDF HTML (experimental) Abstract:To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen downstream tasks. We propose a test-time quantization (TTQ) framework which compresses large models on the fly at inference time to resolve this issue. With an efficient online calibration, instant activation-aware quantization can adapt every prompt regardless of the downstream tasks, yet achieving inference speedup. Several experiments demonstrate that TTQ can improve the quantization performance over state-of-the-art baselines. Comments: Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP) Cite as: arXiv:2603.19296 [cs.LG] (or arXiv:2603.19296v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.19296 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Toshiaki Koike-Akino [view email] [v1] Wed, 11 Mar 2026 02:08:11 UTC (384 KB) Full-text links: Access Paper: Vi...