[2604.01683] Coupled Query-Key Dynamics for Attention

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.01683: Coupled Query-Key Dynamics for Attention

Computer Science > Machine Learning arXiv:2604.01683 (cs) [Submitted on 2 Apr 2026] Title:Coupled Query-Key Dynamics for Attention Authors:Barak Gahtan, Alex M. Bronstein View a PDF of the paper titled Coupled Query-Key Dynamics for Attention, by Barak Gahtan and 1 other authors View PDF HTML (experimental) Abstract:Standard scaled dot-product attention computes scores from static, independent projections of the input. We show that evolving queries and keys \emph{jointly} through shared learned dynamics before scoring - which we call \textbf{coupled QK dynamics} - improves language modeling perplexity and training stability. On WikiText-103 at 60M parameters, coupled dynamics achieves 22.55--22.62 perplexity vs.\ 24.22 for standard attention ($-$6.6--6.9\%), with only 0.11\% additional parameters (shared across both instantiations). A structural ablation isolates coupling as the active ingredient: a symplectic (Hamiltonian) and a non-symplectic (Euler) integrator perform identically when both couple Q and K, while an uncoupled MLP baseline of matched capacity reaches only 23.81 with 8$\times$ higher seed variance. The integration step count (1--7) is similarly irrelevant - a single coupled step suffices. A compute-matched comparison reveals that coupling is a \emph{sample-efficiency} mechanism: standard attention trained for 2.4$\times$ longer (matching wall-clock) reaches the same perplexity, but requires 2.4$\times$ more tokens. The advantage scales to 150M ($-$6.7\%) bu...

Originally published on April 03, 2026. Curated by AI News.

Llms

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

I used Jeff Bezos’ Day 1 rule with ChatGPT to stop procrastinating. These simple prompts helped me start faster, overthink less and get m...

AI Tools & Products · 9 min · less than a minute ago

Llms

ChatGPT and Claude? The Real-World AI Buzz Is Elsewhere

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. ...

AI Tools & Products · 1 min · 1 minute ago

Llms

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

Anthropic investigates unauthorized access to restricted Claude Mythos AI model - SiliconANGLE

AI Tools & Products · 5 min · 1 minute ago

Llms

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact...

Reddit - Artificial Intelligence · 1 min · 13 minutes ago

[2604.01683] Coupled Query-Key Dynamics for Attention

About this article

Related Articles

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

ChatGPT and Claude? The Real-World AI Buzz Is Elsewhere

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

No comments

Stay updated with AI News