[2604.04384] Compressible Softmax-Attended Language under

[2604.04384] Compressible Softmax-Attended Language under Incompressible Attention

arXiv - AI April 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.04384: Compressible Softmax-Attended Language under Incompressible Attention

Computer Science > Computation and Language arXiv:2604.04384 (cs) [Submitted on 6 Apr 2026] Title:Compressible Softmax-Attended Language under Incompressible Attention Authors:Wonsuk Lee View a PDF of the paper titled Compressible Softmax-Attended Language under Incompressible Attention, by Wonsuk Lee View PDF HTML (experimental) Abstract:Across every attention head in five transformer language models (124M--7B parameters, four architecture families), the logit energy field $\tilde{E}$ reaches 90\% of its variance in 2--11 singular components. The \emph{learned} interaction matrix $W_Q^\mathrm{T} W_K$ needs 38--75 components for the same threshold out of $d_h \in \{64, 128\}$. The spectral gap is $5$--$25\times$ in effective rank. The attention mechanism allocates capacity uniformly across all $d_h$ dimensions, but language concentrates the actual interaction into a few. The compressibility of softmax-attended language is a property of the data, not the frame that analyzes it. Comments: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) MSC classes: 68T01 ACM classes: I.2.0 Cite as: arXiv:2604.04384 [cs.CL] (or arXiv:2604.04384v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2604.04384 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Wonsuk Lee [view email] [v1] Mon, 6 Apr 2026 03:18:27 UTC (8 KB) Full-text links: Access Paper: View a PDF of the paper titled Compressible Softmax-Attend...

Originally published on April 07, 2026. Curated by AI News.

Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min · about 1 hour ago

Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min · about 1 hour ago

Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min · about 1 hour ago

Llms

Codex and Claude Code Can Work Together

AI Tools & Products · about 1 hour ago

[2604.04384] Compressible Softmax-Attended Language under Incompressible Attention

About this article

Related Articles

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Codex and Claude Code Can Work Together

No comments

Stay updated with AI News