[2602.06320] High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory
Summary
This paper explores the high-dimensional dynamics of stochastic gradient flow (SGF) in machine learning, providing a closed system of equations that characterizes the asymptotic behavior of multi-pass SGD with small batch sizes.
Why It Matters
Understanding the dynamics of SGD in high dimensions is crucial for improving machine learning models. This research fills a gap in analytical frameworks, offering insights that can enhance model training and performance across various applications, including neural networks.
Key Takeaways
- The paper derives a closed system of low-dimensional equations for SGF in high dimensions.
- It utilizes dynamical mean-field theory to analyze the asymptotic behavior of SGD.
- The findings unify existing frameworks and provide a broader understanding of SGD dynamics.
- The approach is applicable to various models, including generalized linear models and neural networks.
- The research extends existing techniques to handle stochasticity in gradient flows.
Statistics > Machine Learning arXiv:2602.06320 (stat) [Submitted on 6 Feb 2026 (v1), last revised 16 Feb 2026 (this version, v2)] Title:High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory Authors:Sota Nishiyama, Masaaki Imaizumi View a PDF of the paper titled High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory, by Sota Nishiyama and Masaaki Imaizumi View PDF Abstract:Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass SGD with small batch sizes for nonlinear models is currently missing. In this study, we address this gap by analyzing the high-dimensional dynamics of a stochastic differential equation called a \emph{stochastic gradient flow} (SGF), which approximates multi-pass SGD in this regime. In the limit where the number of data samples $n$ and the dimension $d$ grow proportionally, we derive a closed system of low-dimensional and continuous-time equations and prove that it characterizes the asymptotic distribution of the SGF parameters. Our theory is based on the dynamical mean-field theory (DMFT) and is applicable to a wide range of models encompassing generalized linear models and two-layer neural networks. We further show that the resulting DMFT equations r...