[2604.03260] Why Attend to Everything? Focus is the Key

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.03260: Why Attend to Everything? Focus is the Key

Computer Science > Computation and Language arXiv:2604.03260 (cs) [Submitted on 12 Mar 2026] Title:Why Attend to Everything? Focus is the Key Authors:Hengshuai Yao, Xing Chen, Ahmed Murtadha, Jin Li, Shuai Shao, Yasin Abbasi Yadkori, Guan Wang, Mingli Yuan, William Chen, Sen Song View a PDF of the paper titled Why Attend to Everything? Focus is the Key, by Hengshuai Yao and 9 other authors View PDF HTML (experimental) Abstract:We introduce Focus, a method that learns which token pairs matter rather than approximating all of them. Learnable centroids assign tokens to groups; distant attention is restricted to same-group pairs while local attention operates at full resolution. Because all model weights stay frozen, Focus is purely additive: centroid-only training (as few as 148K parameters) improves domain perplexity with zero degradation on downstream benchmarks--from 124M to 70B parameters, across five attention architectures. No existing efficient attention method achieves this in the retrofit setting. At 124M, Focus surpasses full attention (30.3 vs 31.4 PPL); trained from scratch at 7B scale (2B tokens), Focus again beats full attention (13.82 vs 13.89 PPL). At inference, restricting each token to its top-k highest-scoring groups discretizes the soft routing into a hard sparsity pattern, yielding 2x speedup while beating the pretrained baseline (41.3 vs 42.8 PPL); decomposing this pattern into two standard FlashAttention calls reaches 8.6x wall-clock speedup at 1M token...

Originally published on April 07, 2026. Curated by AI News.

Machine Learning

Google signs deal with Pentagon, allowing 'any lawful' use of AI models

https://preview.redd.it/hbbp7hn1cxxg1.png?width=811&format=png&auto=webp&s=a633fe43837bf60e014afaa4c6cf3fe72a4976d3 I feel li...

Reddit - Artificial Intelligence · 1 min · 7 minutes ago

Llms

Karpathy dropped a 200-line GPT, so I used the math to turn pandas DataFrames into searchable context windows and open sourced it (and automated my stats pipeline). [P]

TL;DR: I got tired of manually running Shapiro-Wilk tests and copy-pasting p-values at 2 AM. I built an open-source, async Python pipelin...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

Google and Pentagon reportedly agree deal for ‘any lawful’ use of AI | The Verge

Google has signed a classified deal that allows the US Department of Defense to use its AI models for “any lawful government purpose.”

The Verge - AI · 4 min · about 2 hours ago

Machine Learning

Fresher in AI/ML looking for entry-level opportunities

submitted by /u/SlowButAqurate [link] [comments]

Reddit - ML Jobs · 1 min · about 2 hours ago

[2604.03260] Why Attend to Everything? Focus is the Key

About this article

Related Articles

Google signs deal with Pentagon, allowing 'any lawful' use of AI models

Karpathy dropped a 200-line GPT, so I used the math to turn pandas DataFrames into searchable context windows and open sourced it (and automated my stats pipeline). [P]

Google and Pentagon reportedly agree deal for ‘any lawful’ use of AI | The Verge

Fresher in AI/ML looking for entry-level opportunities

No comments

Stay updated with AI News