[2602.18649] Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms

[2602.18649] Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms

arXiv - AI 4 min read Article

Summary

This paper explores the holographic encoding principle in neural networks, demonstrating that learned algorithms exhibit global low-rank and local full-rank characteristics, impacting compression and interpretability.

Why It Matters

Understanding the encoding of learned algorithms is crucial for advancing machine learning techniques. This research sheds light on how neural networks can maintain high accuracy while operating in low-dimensional spaces, which has implications for model efficiency, interpretability, and the development of more effective AI systems.

Key Takeaways

  • Grokking leads to low-dimensional learning dynamics despite high-dimensional parameter spaces.
  • Neural networks can achieve high accuracy through globally low-rank but locally full-rank structures.
  • Static decompositions may overlook task-relevant structures in learned algorithms.
  • The holographic encoding principle has implications for model compression and interpretability.
  • Dynamic coordination in updates is essential for understanding neural network computations.

Computer Science > Machine Learning arXiv:2602.18649 (cs) [Submitted on 20 Feb 2026] Title:Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms Authors:Yongzhong Xu View a PDF of the paper titled Global Low-Rank, Local Full-Rank: The Holographic Encoding of Learned Algorithms, by Yongzhong Xu View PDF HTML (experimental) Abstract:Grokking -- the abrupt transition from memorization to generalization after extended training -- has been linked to the emergence of low-dimensional structure in learning dynamics. Yet neural network parameters inhabit extremely high-dimensional spaces. How can a low-dimensional learning process produce solutions that resist low-dimensional compression? We investigate this question in multi-task modular arithmetic, training shared-trunk Transformers with separate heads for addition, multiplication, and a quadratic operation modulo 97. Across three model scales (315K--2.2M parameters) and five weight decay settings, we compare three reconstruction methods: per-matrix SVD, joint cross-matrix SVD, and trajectory PCA. Across all conditions, grokking trajectories are confined to a 2--6 dimensional global subspace, while individual weight matrices remain effectively full-rank. Reconstruction from 3--5 trajectory PCs recovers over 95\% of final accuracy, whereas both per-matrix and joint SVD fail at sub-full rank. Even when static decompositions capture most spectral energy, they destroy task-relevant structure. These results ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[P] SpeakFlow - AI Dialogue Practice Coach with GLM 5.1

Built SpeakFlow for the Z.AI Builder Series hackathon. AI dialogue practice coach that evaluates your spoken responses in real-time. Two ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime