[2509.25087] Scaling with Collapse: Efficient and Predictable Training

[2509.25087] Scaling with Collapse: Efficient and Predictable Training of LLM Families

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.25087: Scaling with Collapse: Efficient and Predictable Training of LLM Families

Computer Science > Machine Learning arXiv:2509.25087 (cs) [Submitted on 29 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Scaling with Collapse: Efficient and Predictable Training of LLM Families Authors:Shane Bergsma, Bin Claire Zhang, Nolan Dey, Shaheer Muhammad, Gurpreet Gosal, Joel Hestness View a PDF of the paper titled Scaling with Collapse: Efficient and Predictable Training of LLM Families, by Shane Bergsma and 5 other authors View PDF HTML (experimental) Abstract:Effective LLM training depends on predictable scaling of key quantities -- such as final loss and optimal hyperparameters -- with model and dataset size. Qiu et al. (2025) recently showed that this predictability can extend beyond scalars: whole training loss curves can *collapse* onto a universal trajectory after a simple normalization. What remains unclear is whether this phenomenon persists for LLM families trained under *practical scaling recipes*, where width, depth, learning rate, batch size, and weight decay are scaled jointly. We show that it does: loss curves collapse across scales precisely when optimization hyperparameters are set optimally for the given data budget, in accordance with recent empirical scaling laws. Collapse therefore emerges as a signature of compute-efficient training. We demonstrate two applications at scale: (1) deviation-from-collapse provides a sensitive, early diagnostic of training pathologies, and (2) predictability of collapsed curves enables early s...

Originally published on March 03, 2026. Curated by AI News.

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min · about 4 hours ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · about 5 hours ago

[2509.25087] Scaling with Collapse: Efficient and Predictable Training of LLM Families

About this article

Related Articles

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Why are we blindly trusting AI companies with our data?

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

[2603.16629] MLLM-based Textual Explanations for Face Comparison

No comments

Stay updated with AI News