[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

arXiv - AI March 06, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.05498: The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

Computer Science > Artificial Intelligence arXiv:2603.05498 (cs) [Submitted on 5 Mar 2026] Title:The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks Authors:Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu View a PDF of the paper titled The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks, by Shangwen Sun and 3 other authors View PDF HTML (experimental) Abstract:We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-occurrence is largely an architectural artifact of modern Transformer design, and that the two phenomena serve related but distinct functions. Massive activations operate globally: they induce near-constant hidden representations that persist across layers, effectively functioning as implicit parameters of the model. Attention sinks operate locally: they modulate attention outputs across heads and bias individual heads toward short-range dependencies. We identify the pre-norm configuration as the key choice that enables the co-occurrence, and show that...

Originally published on March 06, 2026. Curated by AI News.

Llms

[D] How's MLX and jax/ pytorch on MacBooks these days?

So I'm looking at buying a new 14 inch MacBook pro with m5 pro and 64 gb of memory vs m4 max with same specs. My priorities are pro sof...

Reddit - Machine Learning · 1 min · 10 minutes ago

Llms

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

BANKING77 (77 fine-grained banking intents) is a well-established but increasingly saturated intent classification benchmark. did this wh...

Reddit - Machine Learning · 1 min · 11 minutes ago

Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

As more Americans use AI chatbots like ChatGPT to compose their wedding vows, one expert asks: “Is the speech sacred to you?”

AI Tools & Products · 12 min · 27 minutes ago

Llms

I tested Gemini on Android Auto and now I can't stop talking to it: 5 tasks it nails

I didn't see much benefit for Google's AI - until now. Here are my favorite ways to use the new Gemini integration in my car.

AI Tools & Products · 7 min · 27 minutes ago

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

About this article

Related Articles

[D] How's MLX and jax/ pytorch on MacBooks these days?

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

I tested Gemini on Android Auto and now I can't stop talking to it: 5 tasks it nails

No comments

Stay updated with AI News