[2505.13109] FreeKV: Boosting KV Cache Retrieval for Efficient LLM

[2505.13109] FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2505.13109: FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

Computer Science > Machine Learning arXiv:2505.13109 (cs) [Submitted on 19 May 2025 (v1), last revised 28 Feb 2026 (this version, v4)] Title:FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference Authors:Guangda Liu, Chengwei Li, Zhenyu Ning, Jing Lin, Yiwu Yao, Danning Ke, Minyi Guo, Jieru Zhao View a PDF of the paper titled FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference, by Guangda Liu and 7 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are widely deployed with rapidly expanding context windows to support increasingly demanding applications. However, long contexts pose significant deployment challenges, primarily due to the KV cache whose size grows proportionally with context length. While KV cache compression methods have been proposed to address this issue, KV dropping methods incur considerable accuracy loss, and KV retrieval methods suffer from significant efficiency bottlenecks. We propose FreeKV, a training-free algorithm-system co-optimization framework to enhance KV retrieval efficiency while preserving accuracy. On the algorithm side, FreeKV introduces speculative retrieval to shift the KV selection and recall processes out of the critical path, combined with fine-grained correction to ensure accuracy. On the system side, FreeKV employs hybrid KV layouts across CPU and GPU memory to eliminate fragmented data transfers, and leverages double-buffered streamed recall to further improve efficiency, enabli...

Originally published on March 03, 2026. Curated by AI News.

Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min · 4 minutes ago

Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min · 4 minutes ago

Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min · 4 minutes ago

Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min · 4 minutes ago

[2505.13109] FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

About this article

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic leaks part of Claude Code's internal source code

Australian government and Anthropic sign MOU for AI safety and research

Penguin to sue OpenAI over ChatGPT version of German children’s book

No comments

Stay updated with AI News