[2602.18922] Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning
Summary
The paper discusses the limitations of current agent caching methods in AI, proposing a new framework, W5H2, that improves efficiency and reduces costs through structured intent canonicalization and few-shot learning.
Why It Matters
This research addresses significant inefficiencies in AI agent operations, particularly in caching mechanisms that lead to high costs. By introducing a new framework, it offers a potential solution that could enhance performance and reduce operational expenses in AI applications across various languages and contexts.
Key Takeaways
- Current caching methods for AI agents are ineffective, achieving low accuracy.
- The proposed W5H2 framework significantly improves cache effectiveness and reduces costs.
- Few-shot learning techniques can enhance performance across multiple languages.
- The study introduces a new multilingual dataset for evaluating agent performance.
- Risk-controlled selective prediction guarantees are provided to ensure reliability.
Computer Science > Computation and Language arXiv:2602.18922 (cs) [Submitted on 21 Feb 2026] Title:Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning Authors:Abhinaba Basu View a PDF of the paper titled Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning, by Abhinaba Basu View PDF HTML (experimental) Abstract:Personal AI agents incur substantial cost via repeated LLM calls. We show existing caching methods fail: GPTCache achieves 37.9% accuracy on real benchmarks; APC achieves 0-12%. The root cause is optimizing for the wrong property -- cache effectiveness requires key consistency and precision, not classification accuracy. We observe cache-key evaluation reduces to clustering evaluation and apply V-measure decomposition to separate these on n=8,682 points across MASSIVE, BANKING77, CLINC150, and NyayaBench v2, our new 8,514-entry multilingual agentic dataset (528 intents, 20 W5H2 classes, 63 languages). We introduce W5H2, a structured intent decomposition framework. Using SetFit with 8 examples per class, W5H2 achieves 91.1%+/-1.7% on MASSIVE in ~2ms -- vs 37.9% for GPTCache and 68.8% for a 20B-parameter LLM at 3,447ms. On NyayaBench v2 (20 classes), SetFit achieves 55.3%, with cross-lingual transfer across 30 languages. Our five-tier cascade handles 85% of interactions locally, projecting 97.5% cost reduction. We provide risk-controlled selective prediction guarantees via ...