[2602.15008] Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees
Summary
This paper explores the efficiency of discrete diffusion models in sampling, establishing sharp convergence guarantees and improving existing bounds for various applications.
Why It Matters
The study addresses the theoretical foundations of discrete diffusion models, which have shown empirical success in machine learning. By providing guarantees for sampling efficiency, it enhances the understanding and application of these models in practical scenarios, potentially impacting fields like generative AI and data science.
Key Takeaways
- The paper presents sharp convergence guarantees for discrete diffusion models using $ au$-leaping-based samplers.
- Improvements in iteration complexity are achieved, eliminating linear dependence on vocabulary size.
- A modified sampler adapts to low-dimensional structures, yielding sublinear convergence rates for structured data.
Computer Science > Machine Learning arXiv:2602.15008 (cs) [Submitted on 16 Feb 2026] Title:Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees Authors:Daniil Dmitriev, Zhihan Huang, Yuting Wei View a PDF of the paper titled Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees, by Daniil Dmitriev and 2 other authors View PDF Abstract:Diffusion models over discrete spaces have recently shown striking empirical success, yet their theoretical foundations remain incomplete. In this paper, we study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation, with a focus on $\tau$-leaping-based samplers. We establish sharp convergence guarantees for attaining $\varepsilon$ accuracy in Kullback-Leibler (KL) divergence for both uniform and masking noising processes. For uniform discrete diffusion, we show that the $\tau$-leaping algorithm achieves an iteration complexity of order $\tilde O(d/\varepsilon)$, with $d$ the ambient dimension of the target distribution, eliminating linear dependence on the vocabulary size $S$ and improving existing bounds by a factor of $d$; moreover, we establish a matching algorithmic lower bound showing that linear dependence on the ambient dimension is unavoidable in general. For masking discrete diffusion, we introduce a modified $\tau$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity...