[2602.10496] Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks
Summary
This paper explores the geometric structure of learning dynamics in transformer models, revealing that training trajectories collapse onto low-dimensional execution manifolds, impacting interpretability and training strategies.
Why It Matters
Understanding the low-dimensional execution manifolds in transformer learning dynamics provides insights into how these models operate in high-dimensional spaces. This has implications for improving model interpretability, optimizing training processes, and leveraging overparameterization effectively in neural networks.
Key Takeaways
- Transformer training trajectories collapse onto low-dimensional manifolds of dimensions 3-4.
- Sharp attention concentration emerges from saturation along routing coordinates within these manifolds.
- SGD commutators align preferentially with execution subspace early in training.
- Sparse autoencoders capture auxiliary routing structures but do not isolate execution.
- The findings suggest a geometric framework for understanding transformer learning dynamics.
Computer Science > Machine Learning arXiv:2602.10496 (cs) [Submitted on 11 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks Authors:Yongzhong Xu View a PDF of the paper titled Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks, by Yongzhong Xu View PDF HTML (experimental) Abstract:We investigate the geometric structure of learning dynamics in overparameterized transformer models through carefully controlled modular arithmetic tasks. Our primary finding is that despite operating in high-dimensional parameter spaces ($d=128$), transformer training trajectories rapidly collapse onto low-dimensional execution manifolds of dimension $3$--$4$. This dimensional collapse is robust across random seeds and moderate task difficulties, though the orientation of the manifold in parameter space varies between runs. We demonstrate that this geometric structure underlies several empirically observed phenomena: (1) sharp attention concentration emerges as saturation along routing coordinates within the execution manifold, (2) SGD commutators are preferentially aligned with the execution subspace (up to $10\times$ random baseline) early in training, with $>92\%$ of non-commutativity confined to orthogonal staging directions and this alignment decreasing as training converges, and (3) sparse autoencoders captur...