[2602.19533] Grokking Finite-Dimensional Algebra
Summary
This paper explores the grokking phenomenon in neural networks, focusing on learning multiplication in finite-dimensional algebras, extending prior work on group operations to more complex algebraic structures.
Why It Matters
Understanding grokking in the context of finite-dimensional algebras can provide insights into how neural networks generalize from memorization to understanding. This research has implications for improving machine learning models and their ability to learn complex mathematical structures, which is crucial in various AI applications.
Key Takeaways
- Grokking involves a transition from memorization to generalization in neural networks.
- The study extends the concept of grokking to finite-dimensional algebras beyond group operations.
- Algebraic properties influence the timing and emergence of grokking.
- Structural properties of the algebra's tensor impact generalization capabilities.
- The research provides a unified framework for understanding grokking across different algebraic structures.
Computer Science > Machine Learning arXiv:2602.19533 (cs) [Submitted on 23 Feb 2026] Title:Grokking Finite-Dimensional Algebra Authors:Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau View a PDF of the paper titled Grokking Finite-Dimensional Algebra, by Pascal Jr Tikeng Notsawo and 2 other authors View PDF HTML (experimental) Abstract:This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional algebras (FDA). While prior work on grokking has focused mainly on group operations, we extend the analysis to more general algebraic structures, including non-associative, non-commutative, and non-unital algebras. We show that learning group operations is a special case of learning FDA, and that learning multiplication in FDA amounts to learning a bilinear product specified by the algebra's structure tensor. For algebras over the reals, we connect the learning problem to matrix factorization with an implicit low-rank bias, and for algebras over finite fields, we show that grokking emerges naturally as models must learn discrete representations of algebraic elements. This leads us to experimentally investigate the following core questions: (i) how do algebraic properties such as commutativity, associativity, and unitality influence both the emergence and timing of grokking, (ii) how structural pr...