[2602.18417] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures
Summary
This paper introduces a framework for sequence models using closed subgroups of U(d), deriving recurrent and transformer architectures from a unified structure.
Why It Matters
The research provides a novel approach to sequence modeling by leveraging mathematical structures, which could enhance the performance of RNNs and transformers. This has implications for various applications in machine learning, particularly in natural language processing and time-series analysis.
Key Takeaways
- Introduces a framework for sequence models based on subgroups of U(d).
- Derives recurrent and transformer architectures from a common structure.
- Evaluates models on datasets like Tiny Shakespeare and Penn Treebank.
- Reports improvements in performance with a linear-mixing extension.
- Highlights the potential for enhanced finite-budget performance.
Computer Science > Machine Learning arXiv:2602.18417 (cs) [Submitted on 20 Feb 2026] Title:Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures Authors:Joshua Nunley View a PDF of the paper titled Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures, by Joshua Nunley View PDF HTML (experimental) Abstract:This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments. Comments: Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL) MSC classes: 68T07, 22E70 ACM classes: I.2.6; G.3 Cite as: arXiv:2602.18417 [cs.LG] (or arXiv:2602.18417v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.18417 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Joshua Nunley [view email] [v1] Fri, 20 Feb 2026 18:35:43 UTC (34 KB) Full-text links: Access Paper: View a PDF of the paper titled Subgroups of $U(d)$ ...