[2601.21093] High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

[2601.21093] High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

arXiv - Machine Learning 4 min read Article

Summary

This paper explores the learning dynamics of multi-pass Stochastic Gradient Descent (SGD) in high-dimensional multi-index models, providing a detailed characterization of its behavior under varying conditions.

Why It Matters

Understanding the dynamics of SGD is crucial for optimizing machine learning algorithms, especially in high-dimensional settings. This research offers insights into how SGD behaves with different batch sizes and learning rates, which can inform better practices in model training and development.

Key Takeaways

  • The paper presents an asymptotic characterization of SGD dynamics in high-dimensional models.
  • It establishes that the dynamics are consistent across various batch size scalings.
  • The study highlights the relationship between SGD, Stochastic Modified Equation (SME), and gradient flow.
  • It provides a Gaussian diffusion approximation to SGD, enhancing understanding of its sampling noise.
  • The findings can guide practitioners in selecting appropriate learning rates and batch sizes.

Statistics > Machine Learning arXiv:2601.21093 (stat) [Submitted on 28 Jan 2026 (v1), last revised 17 Feb 2026 (this version, v2)] Title:High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models Authors:Zhou Fan, Leda Wang View a PDF of the paper titled High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models, by Zhou Fan and 1 other authors View PDF Abstract:We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size $n$ and data dimension $d$ increase proportionally, for any sub-linear batch size $\kappa \asymp n^\alpha$ where $\alpha \in [0,1)$, and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD. Our analyses imply that the limiting dynamics for SGD are the same for any batch size scaling $\alpha \in [0,1)$, and that under a commensurate scaling of the learning rate, dynamics of SG...

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
Llms

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract page for arXiv paper 2603.16105: Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arXiv - AI · 4 min ·
[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings
Llms

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Abstract page for arXiv paper 2603.09643: MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Contro...

arXiv - AI · 4 min ·
[2602.04943] Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Heisenberg Antiferromagnets
Machine Learning

[2602.04943] Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Heisenberg Antiferromagnets

Abstract page for arXiv paper 2602.04943: Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Hei...

arXiv - AI · 3 min ·
[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities
Llms

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

Abstract page for arXiv paper 2602.00185: QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime