[2604.00235] MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation

[2604.00235] MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.00235: MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation

Computer Science > Machine Learning arXiv:2604.00235 (cs) [Submitted on 31 Mar 2026] Title:MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation Authors:Jinghan Yao, Sam Adé Jacobs, Walid Krichene, Masahiro Tanaka, Dhabaleswar K Panda View a PDF of the paper titled MAC-Attention: a Match-Amend-Complete Scheme for Fast and Accurate Attention Computation, by Jinghan Yao and 4 other authors View PDF HTML (experimental) Abstract:Long-context decoding in LLMs is IO-bound: each token re-reads an ever-growing KV cache. Prior accelerations cut bytes via compression, which lowers fidelity, or selection/eviction, which restricts what remains accessible, and both can degrade delayed recall and long-form generation. We introduce MAC-Attention, a fidelity- and access-preserving alternative that accelerates decoding by reusing prior attention computations for semantically similar recent queries. It starts with a match stage that performs pre-RoPE L2 matching over a short local window; an amend stage rectifies the reused attention by recomputing a small band near the match boundary; and a complete stage fuses the rectified results with fresh attention computed on the KV tail through a numerically stable merge. On a match hit, the compute and bandwidth complexity is constant regardless of context length. The method is model-agnostic and composes with IO-aware kernels, paged-KV managers, and MQA/GQA. Across LongBench v2 (120K), RULER (120K), and LongGenBe...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch
Llms

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Arcee is a tiny 26-person U.S. startup that built a high-performing, massive, open source LLM. And it's gaining popularity with OpenClaw ...

TechCrunch - AI · 4 min ·
Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED
Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime