[2604.01978] Homogenized Transformers

arXiv - Machine Learning April 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.01978: Homogenized Transformers

Mathematics > Probability arXiv:2604.01978 (math) [Submitted on 2 Apr 2026] Title:Homogenized Transformers Authors:Hugo Koubbi, Borjan Geshkovski, Philippe Rigollet View a PDF of the paper titled Homogenized Transformers, by Hugo Koubbi and 2 other authors View PDF Abstract:We study a random model of deep multi-head self-attention in which the weights are resampled independently across layers and heads, as at initialization of training. Viewing depth as a time variable, the residual stream defines a discrete-time interacting particle system on the unit sphere. We prove that, under suitable joint scalings of the depth, the residual step size, and the number of heads, this dynamics admits a nontrivial homogenized limit. Depending on the scaling, the limit is either deterministic or stochastic with common noise; in the mean-field regime, the latter leads to a stochastic nonlinear Fokker--Planck equation for the conditional law of a representative token. In the Gaussian setting, the limiting drift vanishes, making the homogenized dynamics explicit enough to study representation collapse. This yields quantitative trade-offs between dimension, context length, and temperature, and identifies regimes in which clustering can be mitigated. Subjects: Probability (math.PR); Machine Learning (cs.LG); Machine Learning (stat.ML) Cite as: arXiv:2604.01978 [math.PR] (or arXiv:2604.01978v1 [math.PR] for this version) https://doi.org/10.48550/arXiv.2604.01978 Focus to learn more arXiv-is...

Originally published on April 03, 2026. Curated by AI News.

Machine Learning

HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]

I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark su...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

How to know if a research-oriented role is for you? [D]

I’m currently a first-year Master’s student in Data Science & AI, and I’m trying to figure out whether a research-oriented career is ...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

GPU Compass – open-source, real-time GPU pricing across 20+ clouds [P]

We maintain an open-source catalog of cloud GPU offerings (skypilot-catalog, Apache 2.0). It auto-fetches pricing from 20+ cloud APIs eve...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

5 AI Models Tried to Scam Me. Some of Them Were Scary Good | WIRED

The cyber capabilities of AI models have experts rattled. AI’s social skills may be just as dangerous.

Wired - AI · 8 min · about 4 hours ago

[2604.01978] Homogenized Transformers

About this article

Related Articles

HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]

How to know if a research-oriented role is for you? [D]

GPU Compass – open-source, real-time GPU pricing across 20+ clouds [P]

5 AI Models Tried to Scam Me. Some of Them Were Scary Good | WIRED

No comments

Stay updated with AI News