[2604.01622] Expert-Choice Routing Enables Adaptive Computation in

[2604.01622] Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.01622: Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

Computer Science > Machine Learning arXiv:2604.01622 (cs) [Submitted on 2 Apr 2026] Title:Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models Authors:Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu View a PDF of the paper titled Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models, by Shuibai Zhang and 10 other authors View PDF HTML (experimental) Abstract:Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from autoregressive systems, leading to load imbalance and rigid computation allocation. We show that expert-choice (EC) routing is a better fit for DLMs: it provides deterministic load balancing by design, yielding higher throughput and faster convergence than TC. Building on the property that EC capacity is externally controllable, we introduce timestep-dependent expert capacity, which varies expert allocation according to the denoising step. We find that allocating more capacity to low-mask-ratio steps consistently achieves the best performance under matched FLOPs, and provide a mechanistic explanation: tokens in low-mask-ratio contexts exhibit an order-of-magnitude higher learning efficiency, so concentrating compute on these steps yields the largest marginal return. Finally, we show that existing pre...

Originally published on April 03, 2026. Curated by AI News.

Llms

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

I used Jeff Bezos’ Day 1 rule with ChatGPT to stop procrastinating. These simple prompts helped me start faster, overthink less and get m...

AI Tools & Products · 9 min · 2 minutes ago

Llms

ChatGPT and Claude? The Real-World AI Buzz Is Elsewhere

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. ...

AI Tools & Products · 1 min · 2 minutes ago

Llms

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

Anthropic investigates unauthorized access to restricted Claude Mythos AI model - SiliconANGLE

AI Tools & Products · 5 min · 2 minutes ago

Llms

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact...

Reddit - Artificial Intelligence · 1 min · 15 minutes ago

[2604.01622] Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

About this article

Related Articles

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

ChatGPT and Claude? The Real-World AI Buzz Is Elsewhere

Anthropic investigates unauthorized access to restricted Claude Mythos AI model

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

No comments

Stay updated with AI News