[2603.19296] TTQ: Activation-Aware Test-Time Quantization to

[2603.19296] TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

arXiv - Machine Learning March 23, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.19296: TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

Computer Science > Machine Learning arXiv:2603.19296 (cs) [Submitted on 11 Mar 2026] Title:TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly Authors:Toshiaki Koike-Akino, Jing Liu, Ye Wang View a PDF of the paper titled TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly, by Toshiaki Koike-Akino and 2 other authors View PDF HTML (experimental) Abstract:To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen downstream tasks. We propose a test-time quantization (TTQ) framework which compresses large models on the fly at inference time to resolve this issue. With an efficient online calibration, instant activation-aware quantization can adapt every prompt regardless of the downstream tasks, yet achieving inference speedup. Several experiments demonstrate that TTQ can improve the quantization performance over state-of-the-art baselines. Comments: Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP) Cite as: arXiv:2603.19296 [cs.LG] (or arXiv:2603.19296v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.19296 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Toshiaki Koike-Akino [view email] [v1] Wed, 11 Mar 2026 02:08:11 UTC (384 KB) Full-text links: Access Paper: Vi...

Originally published on March 23, 2026. Curated by AI News.

Llms

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

Anthropic just ran the classic platform playbook on developers: offer generous limits to build dependency, then tighten the screws once t...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min · about 7 hours ago

[2603.19296] TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

About this article

Related Articles

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Why are we blindly trusting AI companies with our data?

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

No comments

Stay updated with AI News