[2509.25380] Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

[2509.25380] Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

arXiv - Machine Learning 3 min read Article

Summary

The paper introduces the Training Re-evaluation Curve (TREC), a diagnostic tool for optimizing data placement in LLM training, revealing significant performance improvements through proactive curriculum design.

Why It Matters

Understanding how to effectively place training data is crucial for enhancing the performance of large language models (LLMs). The TREC provides a predictive framework that can guide data curation strategies, ultimately leading to more efficient training processes and better model outcomes.

Key Takeaways

  • The Training Re-evaluation Curve (TREC) evaluates model performance based on data encounter timing.
  • Optimal data placement at TREC minima can significantly enhance LLM performance.
  • TRECs can be predicted in advance, allowing for proactive data curriculum design.

Computer Science > Machine Learning arXiv:2509.25380 (cs) [Submitted on 29 Sep 2025 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs Authors:Shane Bergsma, Nolan Dey, Joel Hestness View a PDF of the paper titled Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs, by Shane Bergsma and 2 other authors View PDF Abstract:Data curriculums have become central to successful LLM training, yet principles governing optimal data placement remain unclear. We introduce the *training re-evaluation curve (TREC)*, a diagnostic that retrospectively evaluates training batches *using the final model weights*. The TREC characterizes how well a trained model retains training data as a function of *when* the data was encountered during training. Analyzing TRECs for models from 111M to 3.9B parameters, we show that placing high-quality data at low points on the TREC significantly improves performance. Importantly, while a TREC is initially observable only after training, we demonstrate it can be *predicted in advance* from AdamW's implicit EMA coefficients, enabling proactive curriculum design. By predicting TRECs for published training recipes, we explain prior ablations and reveal suboptimal data placements. We also align high-quality data with TREC minima in order to improve continual pre-training of a 3.9B-parameter LLM trained on 900B tokens. Comments: Subjects: Mac...

Related Articles

Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Why would Claude give me the same response over and over and give others different replies?

I asked Claude to "generate me a random word" so I could do some word play. Then I asked it again in a new prompt window on desktop after...

Reddit - Artificial Intelligence · 1 min ·
Anthropic blocks OpenClaw from Claude subscriptions
Llms

Anthropic blocks OpenClaw from Claude subscriptions

Anthropic forces pay-as-you-go pricing for OpenClaw users after creator joins OpenAI

AI Tools & Products · 6 min ·
Llms

wtf bro did what? arc 3 2026

The Physarum Explorer is a high-speed, bio-inspired neural model designed specifically for ARC geometry. Here is the snapshot of its curr...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime