[2602.18948] Toward Manifest Relationality in Transformers via Symmetry Reduction
Summary
This paper discusses a novel approach to enhance Transformer models by addressing internal redundancy through symmetry reduction, proposing a framework that reformulates attention mechanisms and optimization dynamics based on invariant relational quantities.
Why It Matters
The research is significant as it tackles the inefficiencies in Transformer architectures, which are widely used in machine learning. By reducing parameter redundancy, this approach could lead to more efficient models, improving performance in various applications such as natural language processing and computer vision.
Key Takeaways
- Transformers exhibit internal redundancy due to coordinate-dependent representations.
- The proposed symmetry reduction framework reformulates attention mechanisms.
- Invariant relational quantities can eliminate redundant degrees of freedom.
- This approach may lead to more efficient model architectures.
- Understanding optimization dynamics through geometric frameworks is crucial.
Computer Science > Machine Learning arXiv:2602.18948 (cs) [Submitted on 21 Feb 2026] Title:Toward Manifest Relationality in Transformers via Symmetry Reduction Authors:J. François, L. Ravera View a PDF of the paper titled Toward Manifest Relationality in Transformers via Symmetry Reduction, by J. Fran\c{c}ois and 1 other authors View PDF HTML (experimental) Abstract:Transformer models contain substantial internal redundancy arising from coordinate-dependent representations and continuous symmetries, in model space and in head space, respectively. While recent approaches address this by explicitly breaking symmetry, we propose a complementary framework based on symmetry reduction. We reformulate representations, attention mechanisms, and optimization dynamics in terms of invariant relational quantities, eliminating redundant degrees of freedom by construction. This perspective yields architectures that operate directly on relational structures, providing a principled geometric framework for reducing parameter redundancy and analyzing optimization. Comments: Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); High Energy Physics - Theory (hep-th); Machine Learning (stat.ML) Cite as: arXiv:2602.18948 [cs.LG] (or arXiv:2602.18948v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.18948 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Lucrezia Ravera [view email] [v1] Sat, 21 Feb 2026 1...