[2602.05298] Logarithmic-time Schedules for Scaling Language Models with Momentum

[2602.05298] Logarithmic-time Schedules for Scaling Language Models with Momentum

arXiv - Machine Learning 4 min read Article

Summary

This article presents a novel optimizer, ADANA, which utilizes logarithmic-time scheduling for hyperparameters in large-scale language model training, achieving significant performance gains.

Why It Matters

As language models grow in size and complexity, optimizing their training efficiency becomes crucial. This research introduces a method that enhances compute efficiency by up to 40%, making it relevant for developers and researchers in machine learning and AI, particularly those focused on optimizing large models.

Key Takeaways

  • ADANA optimizer improves efficiency by utilizing logarithmic-time scheduling for hyperparameters.
  • The method achieves up to 40% compute efficiency compared to traditional AdamW optimizers.
  • Longer gradient memory horizons can enhance performance in large-scale language model training.
  • Damping mechanisms are essential for maintaining stability in the new scheduling approach.
  • Logarithmic-time scheduling benefits can also be applied to other optimizers like AdEMAMix.

Statistics > Machine Learning arXiv:2602.05298 (stat) [Submitted on 5 Feb 2026 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Logarithmic-time Schedules for Scaling Language Models with Momentum Authors:Damien Ferbach, Courtney Paquette, Gauthier Gidel, Katie Everett, Elliot Paquette View a PDF of the paper titled Logarithmic-time Schedules for Scaling Language Models with Momentum, by Damien Ferbach and 4 other authors View PDF Abstract:In practice, the hyperparameters $(\beta_1, \beta_2)$ and weight-decay $\lambda$ in AdamW are typically kept at fixed values. Is there any reason to do otherwise? We show that for large-scale language model training, the answer is yes: by exploiting the power-law structure of language data, one can design time-varying schedules for $(\beta_1, \beta_2, \lambda)$ that deliver substantial performance gains. We study logarithmic-time scheduling, in which the optimizer's gradient memory horizon grows with training time. Although naive variants of this are unstable, we show that suitable damping mechanisms restore stability while preserving the benefits of longer memory. Based on this, we present ADANA, an AdamW-like optimizer that couples log-time schedules with explicit damping to balance stability and performance. We empirically evaluate ADANA across transformer scalings (45M to 2.6B parameters), comparing against AdamW, Muon, and AdEMAMix. When properly tuned, ADANA achieves up to 40% compute efficiency relative to a tuned AdamW, w...

Related Articles

Gemini gets notebooks to help you organize projects | The Verge
Llms

Gemini gets notebooks to help you organize projects | The Verge

Google’s Gemini is getting a feature called “notebooks” to help you organize things about certain topics in a single place while using th...

The Verge - AI · 3 min ·
Llms

AWS and Anthropic Advancing AI-powered Cybersecurity With Claude Mythos

The page is currently inaccessible due to a 403 Forbidden error.

AI News - General · 1 min ·
Anthropic Supply-Chain Risk Label Should Stay in Place, Appeals Court Says | WIRED
Llms

Anthropic Supply-Chain Risk Label Should Stay in Place, Appeals Court Says | WIRED

The AI company now faces conflicting rulings in its fight over how Claude can be used by the US military.

Wired - AI · 6 min ·
Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch
Llms

Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch

Tubi becomes the first streaming service to offer an app integration within ChatGPT, the AI chatbot that millions of users turn to for an...

TechCrunch - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime