[2602.13165] Asynchronous Verified Semantic Caching for Tiered LLM Architectures

[2602.13165] Asynchronous Verified Semantic Caching for Tiered LLM Architectures

arXiv - AI 4 min read Article

Summary

The paper introduces Krites, an asynchronous caching policy for large language models (LLMs) that enhances semantic caching efficiency while maintaining latency, significantly improving response accuracy in conversational and search tasks.

Why It Matters

As LLMs become integral to various applications, optimizing their performance through effective caching strategies is crucial. Krites addresses the trade-off between response accuracy and latency, making it a significant advancement in LLM deployment strategies.

Key Takeaways

  • Krites improves static caching efficiency for LLMs without increasing latency.
  • The policy allows for asynchronous verification of cached responses, enhancing accuracy.
  • Simulations show a potential increase of up to 3.9 times in effective static answer usage.

Computer Science > Information Retrieval arXiv:2602.13165 (cs) [Submitted on 13 Feb 2026] Title:Asynchronous Verified Semantic Caching for Tiered LLM Architectures Authors:Asmit Kumar Singh, Haozhe Wang, Laxmi Naga Santosh Attaluri, Tak Chiam, Weihua Zhu View a PDF of the paper titled Asynchronous Verified Semantic Caching for Tiered LLM Architectures, by Asmit Kumar Singh and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployments typically use a tiered static-dynamic design: a static cache of curated, offline vetted responses mined from logs, backed by a dynamic cache populated online. In practice, both tiers are commonly governed by a single embedding similarity threshold, which induces a hard tradeoff: conservative thresholds miss safe reuse opportunities, while aggressive thresholds risk serving semantically incorrect responses. We introduce \textbf{Krites}, an asynchronous, LLM-judged caching policy that expands static coverage without changing serving decisions. On the critical path, Krites behaves exactly like a standard static threshold policy. When the nearest static neighbor of the prompt falls just below the static threshold, Krites asynchronously invokes an LLM judge to verify whether the static response is acceptable for the new prompt. Approved matches are promo...

Related Articles

[2603.29171] Segmentation of Gray Matters and White Matters from Brain MRI data
Llms

[2603.29171] Segmentation of Gray Matters and White Matters from Brain MRI data

Abstract page for arXiv paper 2603.29171: Segmentation of Gray Matters and White Matters from Brain MRI data

arXiv - Machine Learning · 4 min ·
[2602.09924] LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations
Llms

[2602.09924] LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

Abstract page for arXiv paper 2602.09924: LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

arXiv - Machine Learning · 3 min ·
[2602.01528] Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
Llms

[2602.01528] Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Abstract page for arXiv paper 2602.01528: Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

arXiv - Machine Learning · 4 min ·
[2601.22783] Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval
Llms

[2601.22783] Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

Abstract page for arXiv paper 2601.22783: Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime