[2602.21185] The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum
Summary
This article presents advancements in discrete diffusion models, introducing Predictor-Corrector samplers that enhance sampling efficiency and reduce training time in machine learning applications.
Why It Matters
The findings challenge existing assumptions about diffusion-based language modeling, offering improved sampling techniques that can lead to better performance in generative tasks. This is significant for researchers and practitioners in machine learning, as it opens new avenues for enhancing model efficiency and effectiveness.
Key Takeaways
- Predictor-Corrector samplers outperform traditional ancestral sampling in discrete diffusion models.
- The new sampling methods improve with more steps, contrary to previous models.
- A memory-efficient curriculum reduces training time by 25% and memory usage by 33%.
- The research calls into question the dominance of Masked diffusion in future language modeling.
- Code and resources are made available for further exploration and application.
Computer Science > Machine Learning arXiv:2602.21185 (cs) [Submitted on 24 Feb 2026] Title:The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum Authors:Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo View a PDF of the paper titled The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum, by Justin Deschenaux and 2 other authors View PDF HTML (experimental) Abstract:Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% ...