[2510.18114] Latent-Augmented Discrete Diffusion Models
Summary
The paper presents Latent-Augmented Discrete Diffusion Models (LADD), which enhance discrete diffusion models for improved language generation by incorporating latent variables for better token dependency management.
Why It Matters
This research addresses the limitations of existing discrete diffusion models, particularly in handling cross-token dependencies, which is crucial for generating coherent and contextually relevant language outputs. The introduction of latent variables could significantly improve performance in various applications, including natural language processing and generative AI.
Key Takeaways
- LADD introduces a learnable auxiliary latent channel for better token dependency management.
- The model can operate in joint or sequential diffusion schedules, enhancing flexibility.
- Improvements in unconditional generation metrics were observed compared to existing models.
- LADD is effective at lower sampling budgets, making it practical for real-world applications.
- The research provides a foundation for further exploration of latent variable integration in diffusion models.
Computer Science > Machine Learning arXiv:2510.18114 (cs) [Submitted on 20 Oct 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Latent-Augmented Discrete Diffusion Models Authors:Dario Shariatian, Alain Durmus, Umut Simsekli, Stefano Peluchetti View a PDF of the paper titled Latent-Augmented Discrete Diffusion Models, by Dario Shariatian and 3 other authors View PDF Abstract:Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions that ignore cross-token dependencies and degrade performance in the few-step regime. We propose Latent-Augmented Discrete Diffusion (LADD), which introduces a learnable auxiliary latent channel and performs diffusion over the joint (token, latent) space. The latent variables provide an intermediate representation that can express joint structure while preserving tractable parameterizations. We instantiate LADD with continuous latents (Co-LADD) and discrete latents (Di-LADD), and study two inference schedules: a joint diffusion that denoises data and latents together, and a sequential diffusion that first resolves latents and then samples tokens conditionally. We derive ELBO-style objectives and analyze design choices that balance latent expressivity with diffusion compatibility. In experiments, LADDs yield improvements on unconditional generation metrics as compared to state-of-the-art masked dis...