[2602.18649] Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms
Summary
This paper explores the holographic encoding principle in neural networks, demonstrating that learned algorithms exhibit global low-rank and local full-rank characteristics, impacting compression and interpretability.
Why It Matters
Understanding the encoding of learned algorithms is crucial for advancing machine learning techniques. This research sheds light on how neural networks can maintain high accuracy while operating in low-dimensional spaces, which has implications for model efficiency, interpretability, and the development of more effective AI systems.
Key Takeaways
- Grokking leads to low-dimensional learning dynamics despite high-dimensional parameter spaces.
- Neural networks can achieve high accuracy through globally low-rank but locally full-rank structures.
- Static decompositions may overlook task-relevant structures in learned algorithms.
- The holographic encoding principle has implications for model compression and interpretability.
- Dynamic coordination in updates is essential for understanding neural network computations.
Computer Science > Machine Learning arXiv:2602.18649 (cs) [Submitted on 20 Feb 2026] Title:Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms Authors:Yongzhong Xu View a PDF of the paper titled Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms, by Yongzhong Xu View PDF HTML (experimental) Abstract:Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can a low-dimensional learning process produce solutions that resist low-dimensional compression? We investigate this question in multi-task modular arithmetic, training shared-trunk Transformers with separate heads for addition, multiplication, and a quadratic operation modulo 97. Across three model scales (315K--2.2M parameters) and five weight decay settings, we compare three reconstruction methods: per-matrix SVD, joint cross-matrix SVD, and trajectory PCA. Across all conditions, grokking trajectories are confined to a 2--6 dimensional global subspace, while individual weight matrices remain effectively full-rank. Reconstruction from 3--5 trajectory PCs recovers over 95\% of final accuracy, whereas both per-matrix and joint SVD fail at sub-full rank. Even when static decompositions capture most spectral energy, they destroy task-relevant structure. These results ...