[2602.16169] Discrete Stochastic Localization for Non-autoregressive Generation
Summary
The paper presents Discrete Stochastic Localization (DSL), a method that enhances non-autoregressive generation by improving the efficiency of masked diffusion language models, achieving better performance with fewer evaluations.
Why It Matters
This research addresses the challenges of non-autoregressive generation, particularly in reducing decoding latency and error accumulation. By improving the efficiency of masked diffusion models, it has significant implications for the development of faster and more accurate natural language processing systems.
Key Takeaways
- DSL improves the efficiency of non-autoregressive generation methods.
- The method reduces the number of denoiser evaluations needed for high-quality outputs.
- Training alone can significantly enhance the performance of masked diffusion models.
- DSL achieves better self-correction and uncertainty calibration.
- The approach surpasses existing baselines while maintaining autoregressive quality.
Computer Science > Machine Learning arXiv:2602.16169 (cs) [Submitted on 18 Feb 2026] Title:Discrete Stochastic Localization for Non-autoregressive Generation Authors:Yunshu Wu, Jiayi Cheng, Partha Thakuria, Rob Brekelmans, Evangelos E. Papalexakis, Greg Ver Steeg View a PDF of the paper titled Discrete Stochastic Localization for Non-autoregressive Generation, by Yunshu Wu and 5 other authors View PDF HTML (experimental) Abstract:Non-autoregressive (NAR) generation reduces decoding latency by predicting many tokens in parallel, but iterative refinement often suffers from error accumulation and distribution shift under self-generated drafts. Masked diffusion language models (MDLMs) and their remasking samplers (e.g., ReMDM) can be viewed as modern NAR iterative refinement, where generation repeatedly revises a partially observed draft. In this work we show that \emph{training alone} can substantially improve the step-efficiency of MDLM/ReMDM sampling. We propose \textsc{DSL} (Discrete Stochastic Localization), which trains a single SNR-invariant denoiser across a continuum of corruption levels, bridging intermediate draft noise and mask-style endpoint corruption within one Diffusion Transformer. On OpenWebText, \textsc{DSL} fine-tuning yields large MAUVE gains at low step budgets, surpassing the MDLM+ReMDM baseline with \(\sim\)4$\times$ fewer denoiser evaluations, and matches autoregressive quality at high budgets. Analyses show improved self-correction and uncertainty cal...