[2602.15997] Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks
Summary
This article explores the mechanisms of capability emergence in neural networks, revealing a scale-invariant representation collapse and top-down reorganization during training across various model sizes and tasks.
Why It Matters
Understanding capability emergence is crucial for advancing neural network design and improving AI performance. This research provides insights into the geometric properties that influence learning, which can inform future developments in machine learning and AI systems.
Key Takeaways
- Capability emergence involves a universal representation collapse during training, consistent across different model sizes.
- The collapse propagates top-down through network layers, challenging traditional bottom-up learning assumptions.
- Geometric measures can predict task difficulty but not timing, indicating limitations in current predictive models.
Computer Science > Machine Learning arXiv:2602.15997 (cs) [Submitted on 17 Feb 2026] Title:Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks Authors:Jayadev Billa View a PDF of the paper titled Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks, by Jayadev Billa View PDF Abstract:Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME ~ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance 27%; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric p...