[2604.01683] Coupled Query-Key Dynamics for Attention

[2604.01683] Coupled Query-Key Dynamics for Attention

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2604.01683: Coupled Query-Key Dynamics for Attention

Computer Science > Machine Learning arXiv:2604.01683 (cs) [Submitted on 2 Apr 2026] Title:Coupled Query-Key Dynamics for Attention Authors:Barak Gahtan, Alex M. Bronstein View a PDF of the paper titled Coupled Query-Key Dynamics for Attention, by Barak Gahtan and 1 other authors View PDF HTML (experimental) Abstract:Standard scaled dot-product attention computes scores from static, independent projections of the input. We show that evolving queries and keys \emph{jointly} through shared learned dynamics before scoring - which we call \textbf{coupled QK dynamics} - improves language modeling perplexity and training stability. On WikiText-103 at 60M parameters, coupled dynamics achieves 22.55--22.62 perplexity vs.\ 24.22 for standard attention ($-$6.6--6.9\%), with only 0.11\% additional parameters (shared across both instantiations). A structural ablation isolates coupling as the active ingredient: a symplectic (Hamiltonian) and a non-symplectic (Euler) integrator perform identically when both couple Q and K, while an uncoupled MLP baseline of matched capacity reaches only 23.81 with 8$\times$ higher seed variance. The integration step count (1--7) is similarly irrelevant - a single coupled step suffices. A compute-matched comparison reveals that coupling is a \emph{sample-efficiency} mechanism: standard attention trained for 2.4$\times$ longer (matching wall-clock) reaches the same perplexity, but requires 2.4$\times$ more tokens. The advantage scales to 150M ($-$6.7\%) bu...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination
Llms

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

I used Jeff Bezos’ Day 1 rule with ChatGPT to stop procrastinating. These simple prompts helped me start faster, overthink less and get m...

AI Tools & Products · 9 min ·
Llms

ChatGPT and Claude? The Real-World AI Buzz Is Elsewhere

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. ...

AI Tools & Products · 1 min ·
Anthropic investigates unauthorized access to restricted Claude Mythos AI model
Llms

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

Anthropic investigates unauthorized access to restricted Claude Mythos AI model - SiliconANGLE

AI Tools & Products · 5 min ·
Llms

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime