[2602.16849] On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking
Summary
This paper analyzes how two-layer neural networks learn to solve the modular addition task, providing insights into feature learning, training dynamics, and the concept of grokking.
Why It Matters
Understanding the mechanisms behind neural networks' learning processes is crucial for advancing machine learning techniques. This research sheds light on how networks can effectively combine learned features to solve complex tasks, which can inform future developments in AI and optimization strategies.
Key Takeaways
- The paper formalizes a diversification condition that aids in solving modular addition tasks.
- It introduces a lottery ticket mechanism to explain feature emergence under random initialization.
- The study characterizes grokking as a three-stage process involving memorization and generalization phases.
Computer Science > Machine Learning arXiv:2602.16849 (cs) [Submitted on 18 Feb 2026] Title:On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking Authors:Jianliang He, Leda Wang, Siyu Chen, Zhuoran Yang View a PDF of the paper titled On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking, by Jianliang He and 3 other authors View PDF Abstract:We present a comprehensive analysis of how two-layer neural networks learn features to solve the modular addition task. Our work provides a full mechanistic interpretation of the learned model and a theoretical explanation of its training dynamics. While prior work has identified that individual neurons learn single-frequency Fourier features and phase alignment, it does not fully explain how these features combine into a global solution. We bridge this gap by formalizing a diversification condition that emerges during training when overparametrized, consisting of two parts: phase symmetry and frequency diversification. We prove that these properties allow the network to collectively approximate a flawed indicator function on the correct logic for the modular addition task. While individual neurons produce noisy signals, the phase symmetry enables a majority-voting scheme that cancels out noise, allowing the network to robustly identify the correct sum. Furthermore, we explain the emergence of these features under random initialization via a lottery t...