[2511.02077] Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models
Summary
This article presents One-Shot Dynamic Thresholding (OSDT) for diffusion language models, enhancing decoding efficiency and accuracy by calibrating thresholds on individual sequences.
Why It Matters
The introduction of OSDT addresses limitations in existing decoding methods by improving accuracy-throughput trade-offs in masked diffusion language models. This innovation could lead to significant advancements in natural language processing applications, making it relevant for researchers and practitioners in the field.
Key Takeaways
- OSDT improves decoding efficiency by calibrating thresholds on single sequences.
- Achieves significant gains in accuracy-throughput trade-offs across various datasets.
- Demonstrates potential for broader applications in algorithmic and systems innovations.
Computer Science > Machine Learning arXiv:2511.02077 (cs) [Submitted on 3 Nov 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models Authors:Jucheng Shen, Yeonju Ro View a PDF of the paper titled Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models, by Jucheng Shen and Yeonju Ro View PDF HTML (experimental) Abstract:Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innova...