[2602.14209] MAGE: All-[MASK] Block Already Knows Where to Look in Diffusion LLM

[2602.14209] MAGE: All-[MASK] Block Already Knows Where to Look in Diffusion LLM

arXiv - Machine Learning 3 min read Article

Summary

The paper presents MAGE, a novel approach to block diffusion LLMs that optimizes memory access and enhances performance by predicting key-value entries, achieving significant speedups in long-context tasks.

Why It Matters

MAGE addresses the critical bottleneck of memory access in block diffusion LLMs, offering a solution that enhances efficiency and accuracy in language generation tasks. This advancement is particularly relevant as the demand for high-performance AI models continues to grow.

Key Takeaways

  • MAGE utilizes a unique attention mechanism to optimize KV caching in block diffusion LLMs.
  • The method achieves near-lossless accuracy while significantly reducing memory budget requirements.
  • MAGE delivers 3-4x speedup in end-to-end processing for long-context benchmarks.
  • A lightweight fine-tuning strategy allows for effective training with minimal resource use.
  • The approach outperforms existing autoregressive-oriented sparse attention methods.

Computer Science > Machine Learning arXiv:2602.14209 (cs) [Submitted on 15 Feb 2026] Title:MAGE: All-[MASK] Block Already Knows Where to Look in Diffusion LLM Authors:Omin Kwon, Yeonjae Kim, Doyeon Kim, Minseo Kim, Yeonhong Park, Jae W. Lee View a PDF of the paper titled MAGE: All-[MASK] Block Already Knows Where to Look in Diffusion LLM, by Omin Kwon and 5 other authors View PDF HTML (experimental) Abstract:Block diffusion LLMs are emerging as a promising next paradigm for language generation, but their use of KV caching makes memory access a dominant bottleneck in long-context settings. While dynamic sparse attention has been actively explored, existing methods designed for autoregressive LLMs rely on approximate importance estimation and perform poorly when adapted to block diffusion. This work identifies a key opportunity unique to block diffusion: attention at the first All-[MASK] denoising step reliably predicts important KV entries and budget requirements, enabling MAGE to perform a single exact attention pass per block and reuse it for training-free sparse denoising. Across long-context benchmarks including LongBench and Needle-in-a-Haystack, MAGE achieves near-lossless accuracy with a fraction of the KV budget while delivering up to 3-4x end-to-end speedup, consistently outperforming AR-oriented sparse attention baselines. A lightweight fine-tuning strategy further strengthens [MASK]-guided patterns with minimal cost, requiring only a few hours of training on a si...

Related Articles

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED
Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min ·
Google Maps can now write captions for your photos using AI | TechCrunch
Llms

Google Maps can now write captions for your photos using AI | TechCrunch

Gemini can now create captions when users are looking to share a photo or video.

TechCrunch - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime