[2604.04655] Grokking as Dimensional Phase Transition in Neural Networks
About this article
Abstract page for arXiv paper 2604.04655: Grokking as Dimensional Phase Transition in Neural Networks
Computer Science > Machine Learning arXiv:2604.04655 (cs) [Submitted on 6 Apr 2026] Title:Grokking as Dimensional Phase Transition in Neural Networks Authors:Ping Wang View a PDF of the paper titled Grokking as Dimensional Phase Transition in Neural Networks, by Ping Wang View PDF HTML (experimental) Abstract:Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks. Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Adaptation and Self-Organizing Systems (nlin.AO) Cite as: arXiv:2604.04655 [cs.LG] (or arXiv:2604.04655v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2604.04655 Foc...