[2602.06797] Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

[2602.06797] Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

arXiv - Machine Learning 4 min read Article

Summary

This paper explores optimal learning-rate schedules (LRSs) within the functional scaling law framework, revealing distinct behaviors in easy and hard task regimes, and providing insights into practical applications in machine learning.

Why It Matters

Understanding optimal learning-rate schedules is crucial for improving training efficiency in machine learning models, particularly in large language models (LLMs). This research offers theoretical foundations and practical insights that can enhance model performance and reduce training time.

Key Takeaways

  • Optimal learning-rate schedules vary significantly between easy and hard tasks.
  • Power decay and warmup-stable-decay are key strategies for effective training.
  • The study provides a principled evaluation of commonly used learning-rate schedules.
  • Numerical experiments validate the theoretical predictions of optimal LRSs.
  • Insights from this research can guide practitioners in tuning learning rates for better model performance.

Statistics > Machine Learning arXiv:2602.06797 (stat) [Submitted on 6 Feb 2026 (v1), last revised 15 Feb 2026 (this version, v2)] Title:Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay Authors:Binghui Li, Zilin Wang, Fengling Chen, Shiyang Zhao, Ruiheng Zheng, Lei Wu View a PDF of the paper titled Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay, by Binghui Li and 5 other authors View PDF HTML (experimental) Abstract:We study optimal learning-rate schedules (LRSs) under the functional scaling law (FSL) framework introduced in Li et al. (2025), which accurately models the loss dynamics of both linear regression and large language model (LLM) pre-training. Within FSL, loss dynamics are governed by two exponents: a source exponent $s>0$ controlling the rate of signal learning, and a capacity exponent $\beta>1$ determining the rate of noise forgetting. Focusing on a fixed training horizon $N$, we derive the optimal LRSs and reveal a sharp phase transition. In the easy-task regime $s \ge 1 - 1/\beta$, the optimal schedule follows a power decay to zero, $\eta^*(z) = \eta_{\mathrm{peak}}(1 - z/N)^{2\beta - 1}$, where the peak learning rate scales as $\eta_{\mathrm{peak}} \eqsim N^{-\nu}$ for an explicit exponent $\nu = \nu(s,\beta)$. In contrast, in the hard-task regime $s < 1 - 1/\beta$, the optimal LRS exhibits a warmup-stable-decay (WSD) (Hu et al. (2024)) structure: it maintain...

Related Articles

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT | TechCrunch
Llms

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT | TechCrunch

ChatGPT had reportedly been used to plan the attack that killed two and injured five at Florida State University last April. The family o...

TechCrunch - AI · 4 min ·
Llms

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27

On April 27 we’re open-sourcing a free diagnostic tool called iFixAi. You run it against your AI system (agent, copilot, LLM integration,...

Reddit - Artificial Intelligence · 1 min ·
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI can answer your questions with 3D models and simulations | The Verge
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations | The Verge

Google is rolling out a new feature for its Gemini AI chatbot, allowing the tool to generate 3D models and simulations to explain the con...

The Verge - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime