[2601.21093] High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

This paper explores the learning dynamics of multi-pass Stochastic Gradient Descent (SGD) in high-dimensional multi-index models, providing a detailed characterization of its behavior under varying conditions.

Why It Matters

Understanding the dynamics of SGD is crucial for optimizing machine learning algorithms, especially in high-dimensional settings. This research offers insights into how SGD behaves with different batch sizes and learning rates, which can inform better practices in model training and development.

Key Takeaways

The paper presents an asymptotic characterization of SGD dynamics in high-dimensional models.
It establishes that the dynamics are consistent across various batch size scalings.
The study highlights the relationship between SGD, Stochastic Modified Equation (SME), and gradient flow.
It provides a Gaussian diffusion approximation to SGD, enhancing understanding of its sampling noise.
The findings can guide practitioners in selecting appropriate learning rates and batch sizes.

Statistics > Machine Learning arXiv:2601.21093 (stat) [Submitted on 28 Jan 2026 (v1), last revised 17 Feb 2026 (this version, v2)] Title:High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models Authors:Zhou Fan, Leda Wang View a PDF of the paper titled High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models, by Zhou Fan and 1 other authors View PDF Abstract:We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size $n$ and data dimension $d$ increase proportionally, for any sub-linear batch size $\kappa \asymp n^\alpha$ where $\alpha \in [0,1)$, and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD. Our analyses imply that the limiting dynamics for SGD are the same for any batch size scaling $\alpha \in [0,1)$, and that under a commensurate scaling of the learning rate, dynamics of SG...

Read Original Article

[2601.21093] High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

Summary

Why It Matters

Key Takeaways

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

[2602.04943] Graph-Theoretic Analysis of Phase Optimization Complexity in Variational Wave Functions for Heisenberg Antiferromagnets

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

No comments

Stay updated with AI News