[2509.23040] Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Summary
The paper presents ReMemR1, a novel approach for enhancing long-context reasoning in large language models by integrating revisitable memory and a multi-level reward system.
Why It Matters
As large language models struggle with long-context question answering due to information loss and inefficient memory usage, this research offers a significant advancement in memory retrieval mechanisms, potentially improving AI's reasoning capabilities and applications in complex tasks.
Key Takeaways
- ReMemR1 integrates memory retrieval into memory updates for better reasoning.
- The multi-level reward system enhances training effectiveness.
- The approach mitigates information degradation and supports multi-hop reasoning.
- Extensive experiments show significant performance improvements over existing methods.
- The solution incurs negligible computational overhead, making it efficient.
Computer Science > Computation and Language arXiv:2509.23040 (cs) [Submitted on 27 Sep 2025 (v1), last revised 21 Feb 2026 (this version, v4)] Title:Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents Authors:Yaorui Shi, Yuxin Chen, Siyuan Wang, Sihang Li, Hengxing Cai, Qi Gu, Xiang Wang, An Zhang View a PDF of the paper titled Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents, by Yaorui Shi and 7 other authors View PDF HTML (experimental) Abstract:Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory buffer that is dynamically updated via a linear document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from pruning of latent evidence, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, which integrates the mechanism of memory retrieval into the memory update process, enabling the agent to selectively callback historical memories for non-linear reasoning. To further strengthen training, we propose a multi-level reward design, which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support complex multi-hop re...