[2604.02340] Not All Denoising Steps Are Equal: Model Scheduling for

[2604.02340] Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

arXiv - Machine Learning April 06, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.02340: Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

Computer Science > Machine Learning arXiv:2604.02340 (cs) [Submitted on 4 Feb 2026] Title:Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models Authors:Ivan Sedykh, Nikita Sorokin, Valentin Malykh View a PDF of the paper titled Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models, by Ivan Sedykh and 2 other authors View PDF HTML (experimental) Abstract:Recent advances in masked diffusion language models (MDLMs) narrow the quality gap to autoregressive LMs, but their sampling remains expensive because generation requires many full-sequence denoising passes with a large Transformer and, unlike autoregressive decoding, cannot benefit from KV caching. In this work, we exploit the flexibility of the diffusion framework and study model scheduling, where a smaller MDLM replaces the full model at a subset of denoising steps. On OpenWebText, we show that early and late denoising steps are substantially more robust to such replacement than middle steps, enabling up to a 17% reduction in FLOPs with only modest degradation in generative perplexity. We support these findings with a step-importance analysis based on loss and KL divergence between small and large models across timesteps, as well as an exhaustive search over coarse step segments, both of which identify the middle of the diffusion trajectory as most sensitive. Our results suggest that simple, architecture-agnostic scheduling rules can sign...

Originally published on April 06, 2026. Curated by AI News.

Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min · about 4 hours ago

Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min · about 4 hours ago

Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min · about 4 hours ago

Llms

These AI Glasses Switch Between ChatGPT and Gemini. Why Don't More Wearables Do This?

AI Tools & Products · 6 min · about 4 hours ago

[2604.02340] Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

About this article

Related Articles

Google’s Gemini AI can answer your questions with 3D models and simulations

Moody’s Integrates AI Agents With Anthropic’s Claude

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

These AI Glasses Switch Between ChatGPT and Gemini. Why Don't More Wearables Do This?

No comments

Stay updated with AI News