[2602.18417] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[2602.18417] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

arXiv - Machine Learning 3 min read Article

Summary

This paper introduces a framework for sequence models using closed subgroups of U(d), deriving recurrent and transformer architectures from a unified structure.

Why It Matters

The research provides a novel approach to sequence modeling by leveraging mathematical structures, which could enhance the performance of RNNs and transformers. This has implications for various applications in machine learning, particularly in natural language processing and time-series analysis.

Key Takeaways

  • Introduces a framework for sequence models based on subgroups of U(d).
  • Derives recurrent and transformer architectures from a common structure.
  • Evaluates models on datasets like Tiny Shakespeare and Penn Treebank.
  • Reports improvements in performance with a linear-mixing extension.
  • Highlights the potential for enhanced finite-budget performance.

Computer Science > Machine Learning arXiv:2602.18417 (cs) [Submitted on 20 Feb 2026] Title:Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures Authors:Joshua Nunley View a PDF of the paper titled Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures, by Joshua Nunley View PDF HTML (experimental) Abstract:This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments. Comments: Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL) MSC classes: 68T07, 22E70 ACM classes: I.2.6; G.3 Cite as: arXiv:2602.18417 [cs.LG]   (or arXiv:2602.18417v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2602.18417 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Joshua Nunley [view email] [v1] Fri, 20 Feb 2026 18:35:43 UTC (34 KB) Full-text links: Access Paper: View a PDF of the paper titled Subgroups of $U(d)$ ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM, and I wanted to share it here ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Those of you with 10+ years in ML — what is the public completely wrong about?

For those of you who've been in ML/AI research or applied ML for 10+ years — what's the gap between what the public thinks AI is doing vs...

Reddit - Machine Learning · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime