[2603.19296] TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

[2603.19296] TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2603.19296: TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

Computer Science > Machine Learning arXiv:2603.19296 (cs) [Submitted on 11 Mar 2026] Title:TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly Authors:Toshiaki Koike-Akino, Jing Liu, Ye Wang View a PDF of the paper titled TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly, by Toshiaki Koike-Akino and 2 other authors View PDF HTML (experimental) Abstract:To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen downstream tasks. We propose a test-time quantization (TTQ) framework which compresses large models on the fly at inference time to resolve this issue. With an efficient online calibration, instant activation-aware quantization can adapt every prompt regardless of the downstream tasks, yet achieving inference speedup. Several experiments demonstrate that TTQ can improve the quantization performance over state-of-the-art baselines. Comments: Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP) Cite as: arXiv:2603.19296 [cs.LG]   (or arXiv:2603.19296v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2603.19296 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Toshiaki Koike-Akino [view email] [v1] Wed, 11 Mar 2026 02:08:11 UTC (384 KB) Full-text links: Access Paper: Vi...

Originally published on March 23, 2026. Curated by AI News.

Related Articles

Llms

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

Anthropic just ran the classic platform playbook on developers: offer generous limits to build dependency, then tighten the screws once t...

Reddit - Artificial Intelligence · 1 min ·
Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min ·
Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min ·
De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV
Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime