[2602.02007] Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation
Summary
The paper introduces xMemory, a novel approach to agent memory systems that enhances retrieval by decoupling and aggregating semantic components, improving answer quality and token efficiency in dialogue contexts.
Why It Matters
This research addresses limitations in traditional Retrieval-Augmented Generation (RAG) methods, which often produce redundant information in agent memory systems. By proposing a hierarchical structure for memory retrieval, it aims to enhance the efficiency and relevance of responses in AI-driven dialogues, which is crucial for advancing conversational AI technologies.
Key Takeaways
- Traditional RAG methods are inadequate for coherent dialogue streams due to redundancy.
- xMemory utilizes a hierarchical structure to improve retrieval efficiency.
- The proposed method enhances answer quality and reduces token usage in LLMs.
Computer Science > Computation and Language arXiv:2602.02007 (cs) [Submitted on 2 Feb 2026 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation Authors:Zhanghao Hu, Qinglin Zhu, Hanqi Yan, Yulan He, Lin Gui View a PDF of the paper titled Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation, by Zhanghao Hu and 4 other authors View PDF HTML (experimental) Abstract:Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting. RAG targets large, heterogeneous corpora where retrieved passages are diverse, whereas agent memory is a bounded, coherent dialogue stream with highly correlated spans that are often duplicates. Under this shift, fixed top-$k$ similarity retrieval tends to return redundant context, and post-hoc pruning can delete temporally linked prerequisites needed for correct reasoning. We argue retrieval should move beyond similarity matching and instead operate over latent components, following decoupling to aggregation: disentangle memories into semantic components, organise them into a hierarchy, and use this structure to drive retrieval. We propose xMemory, which builds a hierarchy of intact units and maintains a searchable yet faithful high-level node organisation via a sparsity--semantics objective that guides memory split and merge. At inference, xMemory retrieves top-down, selecting a c...