[2602.16849] On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

[2602.16849] On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

arXiv - Machine Learning 4 min read Article

Summary

This paper analyzes how two-layer neural networks learn to solve the modular addition task, providing insights into feature learning, training dynamics, and the concept of grokking.

Why It Matters

Understanding the mechanisms behind neural networks' learning processes is crucial for advancing machine learning techniques. This research sheds light on how networks can effectively combine learned features to solve complex tasks, which can inform future developments in AI and optimization strategies.

Key Takeaways

  • The paper formalizes a diversification condition that aids in solving modular addition tasks.
  • It introduces a lottery ticket mechanism to explain feature emergence under random initialization.
  • The study characterizes grokking as a three-stage process involving memorization and generalization phases.

Computer Science > Machine Learning arXiv:2602.16849 (cs) [Submitted on 18 Feb 2026] Title:On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking Authors:Jianliang He, Leda Wang, Siyu Chen, Zhuoran Yang View a PDF of the paper titled On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking, by Jianliang He and 3 other authors View PDF Abstract:We present a comprehensive analysis of how two-layer neural networks learn features to solve the modular addition task. Our work provides a full mechanistic interpretation of the learned model and a theoretical explanation of its training dynamics. While prior work has identified that individual neurons learn single-frequency Fourier features and phase alignment, it does not fully explain how these features combine into a global solution. We bridge this gap by formalizing a diversification condition that emerges during training when overparametrized, consisting of two parts: phase symmetry and frequency diversification. We prove that these properties allow the network to collectively approximate a flawed indicator function on the correct logic for the modular addition task. While individual neurons produce noisy signals, the phase symmetry enables a majority-voting scheme that cancels out noise, allowing the network to robustly identify the correct sum. Furthermore, we explain the emergence of these features under random initialization via a lottery t...

Related Articles

Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min ·
Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime