[2510.27246] Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

[2510.27246] Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

arXiv - AI 4 min read Article

Summary

This paper presents a new framework for evaluating and enhancing long-term memory in large language models (LLMs), introducing the BEAM benchmark and the LIGHT memory system.

Why It Matters

As LLMs become increasingly integral to conversational AI, understanding their long-term memory capabilities is crucial. This research addresses existing limitations in benchmarks and proposes innovative solutions that could significantly improve LLM performance in complex dialogue scenarios.

Key Takeaways

  • Introduces BEAM, a benchmark for assessing LLMs' long-term memory.
  • Proposes LIGHT, a memory framework enhancing LLMs with multiple memory systems.
  • Demonstrates that existing LLMs struggle with longer dialogues without enhancements.
  • Reports performance improvements of 3.5%-12.69% using the LIGHT framework.
  • Highlights the need for coherent and diverse conversation datasets in LLM training.

Computer Science > Computation and Language arXiv:2510.27246 (cs) [Submitted on 31 Oct 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs Authors:Mohammad Tavakoli, Alireza Salemi, Carrie Ye, Mohamed Abdalla, Hamed Zamani, J Ross Mitchell View a PDF of the paper titled Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs, by Mohammad Tavakoli and 5 other authors View PDF HTML (experimental) Abstract:Evaluating the abilities of large language models (LLMs) for tasks that require long-term memory and thus long-context reasoning, for example in conversational settings, is hampered by the existing benchmarks, which often lack narrative coherence, cover narrow domains, and only test simple recall-oriented tasks. This paper introduces a comprehensive solution to these challenges. First, we present a novel framework for automatically generating long (up to 10M tokens), coherent, and topically diverse conversations, accompanied by probing questions targeting a wide range of memory abilities. From this, we construct BEAM, a new benchmark comprising 100 conversations and 2,000 validated questions. Second, to enhance model performance, we propose LIGHT-a framework inspired by human cognition that equips LLMs with three complementary memory systems: a long-term episodic memory, a short-term working memory, and a scratchpad for accumulating salient facts. Our experiments on B...

Related Articles

What is AI, how do apps like ChatGPT work and why are there concerns?
Llms

What is AI, how do apps like ChatGPT work and why are there concerns?

AI is transforming modern life, but some critics worry about its potential misuse and environmental impact.

AI News - General · 7 min ·
[2603.29957] Think Anywhere in Code Generation
Llms

[2603.29957] Think Anywhere in Code Generation

Abstract page for arXiv paper 2603.29957: Think Anywhere in Code Generation

arXiv - Machine Learning · 3 min ·
[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning
Llms

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

Abstract page for arXiv paper 2603.16880: NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectr...

arXiv - Machine Learning · 4 min ·
[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime