[2602.13967] Neuromem: A Granular Decomposition of the Streaming Lifecycle in External Memory for LLMs
Summary
The paper presents Neuromem, a framework for evaluating external memory modules in large language models (LLMs) under a dynamic streaming context, focusing on the lifecycle of memory management.
Why It Matters
As LLMs increasingly rely on external memory for real-time data integration, understanding how to manage memory effectively is crucial for improving accuracy and efficiency. Neuromem addresses the challenges of evolving memory states, providing insights into performance degradation and optimal data structures.
Key Takeaways
- Neuromem benchmarks external memory modules in LLMs under interleaved insertion and retrieval scenarios.
- Memory lifecycle management is critical, with performance typically degrading as memory size increases.
- The choice of memory data structure significantly impacts the quality of outcomes.
- Aggressive compression techniques shift costs between insertion and retrieval without substantial accuracy improvements.
- Time-related queries are identified as the most challenging in the memory management process.
Computer Science > Artificial Intelligence arXiv:2602.13967 (cs) [Submitted on 15 Feb 2026] Title:Neuromem: A Granular Decomposition of the Streaming Lifecycle in External Memory for LLMs Authors:Ruicheng Zhang, Xinyi Li, Tianyi Xu, Shuhao Zhang, Xiaofei Liao, Hai Jin View a PDF of the paper titled Neuromem: A Granular Decomposition of the Streaming Lifecycle in External Memory for LLMs, by Ruicheng Zhang and 5 other authors View PDF HTML (experimental) Abstract:Most evaluations of External Memory Module assume a static setting: memory is built offline and queried at a fixed state. In practice, memory is streaming: new facts arrive continuously, insertions interleave with retrievals, and the memory state evolves while the model is serving queries. In this regime, accuracy and cost are governed by the full memory lifecycle, which encompasses the ingestion, maintenance, retrieval, and integration of information into generation. We present Neuromem, a scalable testbed that benchmarks External Memory Modules under an interleaved insertion-and-retrieval protocol and decomposes its lifecycle into five dimensions including memory data structure, normalization strategy, consolidation policy, query formulation strategy, and context integration mechanism. Using three representative datasets LOCOMO, LONGMEMEVAL, and MEMORYAGENTBENCH, Neuromem evaluates interchangeable variants within a shared serving stack, reporting token-level F1 and insertion/retrieval latency. Overall, we observe...