[2604.01978] Homogenized Transformers

[2604.01978] Homogenized Transformers

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.01978: Homogenized Transformers

Mathematics > Probability arXiv:2604.01978 (math) [Submitted on 2 Apr 2026] Title:Homogenized Transformers Authors:Hugo Koubbi, Borjan Geshkovski, Philippe Rigollet View a PDF of the paper titled Homogenized Transformers, by Hugo Koubbi and 2 other authors View PDF Abstract:We study a random model of deep multi-head self-attention in which the weights are resampled independently across layers and heads, as at initialization of training. Viewing depth as a time variable, the residual stream defines a discrete-time interacting particle system on the unit sphere. We prove that, under suitable joint scalings of the depth, the residual step size, and the number of heads, this dynamics admits a nontrivial homogenized limit. Depending on the scaling, the limit is either deterministic or stochastic with common noise; in the mean-field regime, the latter leads to a stochastic nonlinear Fokker--Planck equation for the conditional law of a representative token. In the Gaussian setting, the limiting drift vanishes, making the homogenized dynamics explicit enough to study representation collapse. This yields quantitative trade-offs between dimension, context length, and temperature, and identifies regimes in which clustering can be mitigated. Subjects: Probability (math.PR); Machine Learning (cs.LG); Machine Learning (stat.ML) Cite as: arXiv:2604.01978 [math.PR]   (or arXiv:2604.01978v1 [math.PR] for this version)   https://doi.org/10.48550/arXiv.2604.01978 Focus to learn more arXiv-is...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

Machine Learning

HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]

I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark su...

Reddit - Machine Learning · 1 min ·
Machine Learning

How to know if a research-oriented role is for you? [D]

I’m currently a first-year Master’s student in Data Science & AI, and I’m trying to figure out whether a research-oriented career is ...

Reddit - Machine Learning · 1 min ·
Machine Learning

GPU Compass – open-source, real-time GPU pricing across 20+ clouds [P]

We maintain an open-source catalog of cloud GPU offerings (skypilot-catalog, Apache 2.0). It auto-fetches pricing from 20+ cloud APIs eve...

Reddit - Machine Learning · 1 min ·
5 AI Models Tried to Scam Me. Some of Them Were Scary Good | WIRED
Machine Learning

5 AI Models Tried to Scam Me. Some of Them Were Scary Good | WIRED

The cyber capabilities of AI models have experts rattled. AI’s social skills may be just as dangerous.

Wired - AI · 8 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime