[2511.03952] High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

[2511.03952] High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes

arXiv - Machine Learning 4 min read Article

Summary

This paper presents high-dimensional limit theorems for Stochastic Gradient Descent (SGD) with Polyak Momentum and adaptive step-sizes, comparing it with online SGD and demonstrating its benefits in specific learning problems.

Why It Matters

Understanding the dynamics of SGD in high-dimensional settings is crucial for improving machine learning algorithms. This research provides a rigorous framework that can enhance the performance of SGD variants, potentially leading to better convergence rates and stability in practical applications.

Key Takeaways

  • SGD with Polyak Momentum can amplify high-dimensional effects, affecting performance.
  • Adaptive step-sizes can stabilize SGD dynamics, improving convergence.
  • The paper provides a rigorous comparison between SGD variants under high-dimensional scaling.
  • Two learning problems are examined, showcasing the practical implications of the findings.
  • Early preconditioners can enhance SGD performance in challenging scenarios.

Statistics > Machine Learning arXiv:2511.03952 (stat) [Submitted on 6 Nov 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes Authors:Aukosh Jagannath, Taj Jones-McCormick, Varnan Sarangian View a PDF of the paper titled High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes, by Aukosh Jagannath and 2 other authors View PDF HTML (experimental) Abstract:We develop a high-dimensional scaling limit for Stochastic Gradient Descent with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework to rigourously compare online SGD with some of its popular variants. We show that the scaling limits of SGD-M coincide with those of online SGD after an appropriate time rescaling and a specific choice of step-size. However, if the step-size is kept the same between the two algorithms, SGD-M will amplify high-dimensional effects, potentially degrading performance relative to online SGD. We demonstrate our framework on two popular learning problems: Spiked Tensor PCA and Single Index Models. In both cases, we also examine online SGD with an adaptive step-size based on normalized gradients. In the high-dimensional regime, this algorithm yields multiple benefits: its dynamics admit fixed points closer to the population minimum and widens the range of admissible step-sizes for which the iterates converge to such solutions. These examples provide a rigorous account, aligning ...

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
Llms

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract page for arXiv paper 2603.16105: Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arXiv - AI · 4 min ·
[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings
Llms

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Abstract page for arXiv paper 2603.09643: MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Contro...

arXiv - AI · 4 min ·
[2602.04943] Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Heisenberg Antiferromagnets
Machine Learning

[2602.04943] Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Heisenberg Antiferromagnets

Abstract page for arXiv paper 2602.04943: Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Hei...

arXiv - AI · 3 min ·
[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities
Llms

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

Abstract page for arXiv paper 2602.00185: QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime