[2506.10848] Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

[2506.10848] Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2506.10848: Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

Computer Science > Computation and Language arXiv:2506.10848 (cs) [Submitted on 12 Jun 2025 (v1), last revised 31 Mar 2026 (this version, v3)] Title:Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles Authors:Qingyan Wei, Yaojie Zhang, Zhiyuan Liu, Puyu Zeng, Yuxuan Wang, Biqing Qi, Dongrui Liu, Linfeng Zhang View a PDF of the paper titled Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles, by Qingyan Wei and 7 other authors View PDF HTML (experimental) Abstract:Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. In this paper, we propose SlowFast Sampling, a novel dynamic sampling strategy that adaptively alternates between exploratory and accelerated decoding stages. Our method is guided by three golden principles: certainty principle, convergence principle, and positional principle, which govern when and where tokens can be confidently and efficiently decoded. We further integrate our strategy with dLLM-Cache to reduce redundant computation. Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63$...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

Llms

Agents Can Now Propose and Deploy Their Own Code Changes

150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...

Reddit - Artificial Intelligence · 1 min ·
[2603.17839] How do LLMs Compute Verbal Confidence
Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min ·
[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min ·
[2603.10062] Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead
Llms

[2603.10062] Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Abstract page for arXiv paper 2603.10062: Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

arXiv - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime