[2603.20155] Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
About this article
Abstract page for arXiv paper 2603.20155: Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
Computer Science > Machine Learning arXiv:2603.20155 (cs) [Submitted on 20 Mar 2026] Title:Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD Authors:Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans View a PDF of the paper titled Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD, by Emiel Hoogeboom and 4 other authors View PDF Abstract:It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers. Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML) Cite as: arXiv:2603.20155 [cs.LG] (or arXiv:2603.20155v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.20155 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Emiel Hoogeboom [view email] [v1] Fri, 20 Mar 2026 17:29:12 UTC (801 KB) Full-text links: Access Paper: View a PDF of the paper titled Beyond Single Tokens...