[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
About this article
Abstract page for arXiv paper 2603.04411: One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
Computer Science > Computation and Language arXiv:2603.04411 (cs) [Submitted on 3 Feb 2026] Title:One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache Authors:Liming Lu, Kaixi Qiu, Jiayu Zhou, Jushi Kai, Haoyan Zhang, Huanyu Wang, Jingwen Leng, Ziwei He, Zhouhan Lin View a PDF of the paper titled One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache, by Liming Lu and 8 other authors View PDF HTML (experimental) Abstract:Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottleneck for efficient inference. While dimensionality reduction offers a promising compression avenue, existing approaches typically either necessitate prohibitively expensive pre-training from scratch or suffer from severe performance deterioration under high compression regimes. In this work, we propose DynaKV, a novel post-training framework for low-rank KV cache compression. To the best of our knowledge, DynaKV is the first method to dynamically allocate compression rates to individual tokens according to their semantic meaning, which allows it to achieve better fidelity at aggressive compression ratios. Extensive experiments demonstrate that our method consistently outperforms existing state-of-the-art compression techniques, achieving significant memory reduction while maintaining competitive generation quality. Furthermore, our approach is orthogonal to sequence-level p...