[2601.21093] High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models
Summary
This paper explores the learning dynamics of multi-pass Stochastic Gradient Descent (SGD) in high-dimensional multi-index models, providing a detailed characterization of its behavior under varying conditions.
Why It Matters
Understanding the dynamics of SGD is crucial for optimizing machine learning algorithms, especially in high-dimensional settings. This research offers insights into how SGD behaves with different batch sizes and learning rates, which can inform better practices in model training and development.
Key Takeaways
- The paper presents an asymptotic characterization of SGD dynamics in high-dimensional models.
- It establishes that the dynamics are consistent across various batch size scalings.
- The study highlights the relationship between SGD, Stochastic Modified Equation (SME), and gradient flow.
- It provides a Gaussian diffusion approximation to SGD, enhancing understanding of its sampling noise.
- The findings can guide practitioners in selecting appropriate learning rates and batch sizes.
Statistics > Machine Learning arXiv:2601.21093 (stat) [Submitted on 28 Jan 2026 (v1), last revised 17 Feb 2026 (this version, v2)] Title:High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models Authors:Zhou Fan, Leda Wang View a PDF of the paper titled High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models, by Zhou Fan and 1 other authors View PDF Abstract:We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size $n$ and data dimension $d$ increase proportionally, for any sub-linear batch size $\kappa \asymp n^\alpha$ where $\alpha \in [0,1)$, and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD. Our analyses imply that the limiting dynamics for SGD are the same for any batch size scaling $\alpha \in [0,1)$, and that under a commensurate scaling of the learning rate, dynamics of SG...