[2511.03952] High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

This paper presents high-dimensional limit theorems for Stochastic Gradient Descent (SGD) with Polyak Momentum and adaptive step-sizes, comparing it with online SGD and demonstrating its benefits in specific learning problems.

Why It Matters

Understanding the dynamics of SGD in high-dimensional settings is crucial for improving machine learning algorithms. This research provides a rigorous framework that can enhance the performance of SGD variants, potentially leading to better convergence rates and stability in practical applications.

Key Takeaways

SGD with Polyak Momentum can amplify high-dimensional effects, affecting performance.
Adaptive step-sizes can stabilize SGD dynamics, improving convergence.
The paper provides a rigorous comparison between SGD variants under high-dimensional scaling.
Two learning problems are examined, showcasing the practical implications of the findings.
Early preconditioners can enhance SGD performance in challenging scenarios.

Statistics > Machine Learning arXiv:2511.03952 (stat) [Submitted on 6 Nov 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes Authors:Aukosh Jagannath, Taj Jones-McCormick, Varnan Sarangian View a PDF of the paper titled High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes, by Aukosh Jagannath and 2 other authors View PDF HTML (experimental) Abstract:We develop a high-dimensional scaling limit for Stochastic Gradient Descent with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework to rigourously compare online SGD with some of its popular variants. We show that the scaling limits of SGD-M coincide with those of online SGD after an appropriate time rescaling and a specific choice of step-size. However, if the step-size is kept the same between the two algorithms, SGD-M will amplify high-dimensional effects, potentially degrading performance relative to online SGD. We demonstrate our framework on two popular learning problems: Spiked Tensor PCA and Single Index Models. In both cases, we also examine online SGD with an adaptive step-size based on normalized gradients. In the high-dimensional regime, this algorithm yields multiple benefits: its dynamics admit fixed points closer to the population minimum and widens the range of admissible step-sizes for which the iterates converge to such solutions. These examples provide a rigorous account, aligning ...

Read Original Article

[2511.03952] High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

Summary

Why It Matters

Key Takeaways

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

[2602.04943] Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Heisenberg Antiferromagnets

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

No comments

Stay updated with AI News