[2602.18523] The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure

[2602.18523] The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure

arXiv - AI 4 min read Article

Summary

This article explores the geometric analysis of multi-task grokking in machine learning, detailing five key phenomena observed during training shared-trunk Transformers on multiple arithmetic tasks.

Why It Matters

Understanding multi-task grokking is crucial for advancing machine learning models, particularly in optimizing their performance across various tasks. This research provides insights into how weight decay influences model training and generalization, which can inform future developments in AI systems.

Key Takeaways

  • Grokking transitions from memorization to generalization occur in a staggered order based on task complexity.
  • Optimization trajectories are confined to a low-dimensional manifold, with specific defects indicating generalization readiness.
  • Weight decay significantly impacts grokking timescales and model performance, revealing distinct operational regimes.
  • Final solutions are fragile and highly sensitive to perturbations, indicating a need for robust training methods.
  • Redundant parameters in overparameterized models can recover performance even after significant gradient component removal.

Computer Science > Machine Learning arXiv:2602.18523 (cs) [Submitted on 19 Feb 2026] Title:The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure Authors:Yongzhong Xu View a PDF of the paper titled The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure, by Yongzhong Xu View PDF HTML (experimental) Abstract:Grokking -- the abrupt transition from memorization to generalization long after near-zero training loss -- has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transformers on dual-task (mod-add + mod-mul) and tri-task (mod-add + mod-mul + mod-sq) objectives across a systematic weight decay sweep. Five consistent phenomena emerge. (1) Staggered grokking order: multiplication generalizes first, followed by squaring, then addition, with consistent delays across seeds. (2) Universal integrability: optimization trajectories remain confined to an empirically invariant low-dimensional execution manifold; commutator defects orthogonal to this manifold reliably precede generalization. (3) Weight decay phase structure: grokking timescale, curvature depth, reconstruction threshold, and defect lead covary systematically with weight decay, revealing distinct dynamical regimes and a sharp no-decay failure mode. (4) Holographic incompressibility: final solutions occupy only 4--8 principal trajector...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[P] SpeakFlow - AI Dialogue Practice Coach with GLM 5.1

Built SpeakFlow for the Z.AI Builder Series hackathon. AI dialogue practice coach that evaluates your spoken responses in real-time. Two ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime