[2602.21797] Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
Summary
The paper presents StrassenNet, a neural architecture that learns fast matrix multiplication algorithms, specifically reproducing the Strassen algorithm for 2x2 and exploring 3x3 multiplication, revealing insights into tensor ranks and their implications for computational effi...
Why It Matters
This research is significant as it combines neural networks with classical algorithms to enhance computational efficiency in matrix multiplication, a fundamental operation in many scientific and engineering applications. Understanding the rank of multiplication tensors can lead to improved algorithms and performance in various domains, including machine learning and data science.
Key Takeaways
- StrassenNet effectively reproduces the Strassen algorithm for 2x2 matrix multiplication.
- The architecture demonstrates a clear numerical threshold in tensor rank for 3x3 multiplication, indicating optimal performance at rank 23.
- The findings suggest potential extensions to border-rank decompositions, which could influence future research in matrix multiplication.
Mathematics > Algebraic Geometry arXiv:2602.21797 (math) [Submitted on 25 Feb 2026] Title:Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach Authors:Paolo Andreini, Alessandra Bernardi, Monica Bianchini, Barbara Toniella Corradini, Sara Marziali, Giacomo Nunziati, Franco Scarselli View a PDF of the paper titled Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach, by Paolo Andreini and 6 other authors View PDF HTML (experimental) Abstract:Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on $3\times 3$ multiplication with rank $r\in\{19,\dots,23\}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $r\le 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3\times 3$. We also sketch an extension of the method to border-rank decompositions via an $\varepsilon$--parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3\times 3$ matrix--multiplication tensor. Comme...