Llms Machine Learning Generative Ai Data Science Nlp

[2511.02077] Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

arXiv - Machine Learning February 17, 2026 3 min read Article

Summary

This article presents One-Shot Dynamic Thresholding (OSDT) for diffusion language models, enhancing decoding efficiency and accuracy by calibrating thresholds on individual sequences.

Why It Matters

The introduction of OSDT addresses limitations in existing decoding methods by improving accuracy-throughput trade-offs in masked diffusion language models. This innovation could lead to significant advancements in natural language processing applications, making it relevant for researchers and practitioners in the field.

Key Takeaways

OSDT improves decoding efficiency by calibrating thresholds on single sequences.
Achieves significant gains in accuracy-throughput trade-offs across various datasets.
Demonstrates potential for broader applications in algorithmic and systems innovations.

Computer Science > Machine Learning arXiv:2511.02077 (cs) [Submitted on 3 Nov 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models Authors:Jucheng Shen, Yeonju Ro View a PDF of the paper titled Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models, by Jucheng Shen and Yeonju Ro View PDF HTML (experimental) Abstract:Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innova...

Read Original Article

[2511.02077] Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

Summary

Why It Matters

Key Takeaways

Related Articles

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

No comments

Stay updated with AI News