[2603.23784] Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic

[2603.23784] Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.23784: Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic

Computer Science > Machine Learning arXiv:2603.23784 (cs) [Submitted on 24 Mar 2026] Title:Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic Authors:Anand Swaroop View a PDF of the paper titled Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic, by Anand Swaroop View PDF HTML (experimental) Abstract:Grokking-the phenomenon where validation accuracy of neural networks on modular addition of two integers rises long after training data has been memorized-has been characterized in previous works as producing sinusoidal input weight distributions in transformers and multi-layer perceptrons (MLPs). We find empirically that ReLU MLPs in our experimental setting instead learn near-binary square wave input weights, where intermediate-valued weights appear exclusively near sign-change boundaries, alongside output weight distributions whose dominant Fourier phases satisfy a phase-sum relation $\phi_{\mathrm{out}} = \phi_a + \phi_b$; this relation holds even when the model is trained on noisy data and fails to grok. We extract the frequency and phase of each neuron's weights via DFT and construct an idealized MLP: Input weights are replaced by perfect binary square waves and output weights by cosines, both parametrized by the frequencies, phases, and amplitudes extracted from the dominant Fourier components of the real model weights. This idealized model achieves 95.5% accuracy whe...

Originally published on March 26, 2026. Curated by AI News.

Related Articles

Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of th...

Reddit - Machine Learning · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperpara...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime