[2602.01434] Phase Transitions for Feature Learning in Neural Networks
Summary
This paper explores phase transitions in neural networks, specifically focusing on feature learning dynamics in two-layer networks and establishing a threshold for effective learning.
Why It Matters
Understanding phase transitions in neural networks is crucial for improving feature learning efficiency. This research provides insights into the dynamics of learning in high-dimensional spaces, which can inform the design of more effective neural network architectures and training algorithms.
Key Takeaways
- Identifies a threshold for effective feature learning in two-layer neural networks.
- Establishes a relationship between learning dynamics and network architecture.
- Explains the role of gradient dynamics in the learning process.
- Highlights the significance of phase transitions in neural network training.
- Provides a formal framework for analyzing learning in high-dimensional spaces.
Computer Science > Machine Learning arXiv:2602.01434 (cs) [Submitted on 1 Feb 2026 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Phase Transitions for Feature Learning in Neural Networks Authors:Andrea Montanari, Zihao Wang View a PDF of the paper titled Phase Transitions for Feature Learning in Neural Networks, by Andrea Montanari and Zihao Wang View PDF Abstract:According to a popular viewpoint, neural networks learn from data by first identifying low-dimensional representations, and subsequently fitting the best model in this space. Recent works provide a formalization of this phenomenon when learning multi-index models. In this setting, we are given $n$ i.i.d. pairs $({\boldsymbol x}_i,y_i)$, where the covariate vectors ${\boldsymbol x}_i\in\mathbb{R}^d$ are isotropic, and responses $y_i$ only depend on ${\boldsymbol x}_i$ through a $k$-dimensional projection ${\boldsymbol \Theta}_*^{\sf T}{\boldsymbol x}_i$. Feature learning amounts to learning the latent space spanned by ${\boldsymbol \Theta}_*$. In this context, we study the gradient descent dynamics of two-layer neural networks under the proportional asymptotics $n,d\to\infty$, $n/d\to\delta$, while the dimension of the latent space $k$ and the number of hidden neurons $m$ are kept fixed. Earlier work establishes that feature learning via polynomial-time algorithms is possible if $\delta> \delta_{\text{alg}}$, for $\delta_{\text{alg}}$ a threshold depending on the data distribution, and is impossible (wi...