[2602.06412] Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
About this article
Abstract page for arXiv paper 2602.06412: Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
Computer Science > Computation and Language arXiv:2602.06412 (cs) [Submitted on 6 Feb 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding Authors:Daisuke Oba, Danushka Bollegala, Masahiro Kaneko, Naoaki Okazaki View a PDF of the paper titled Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding, by Daisuke Oba and Danushka Bollegala and Masahiro Kaneko and Naoaki Okazaki View PDF HTML (experimental) Abstract:Masked Diffusion Language Models generate sequences via iterative sampling that progressively unmasks tokens. However, they still recompute the attention and feed-forward blocks for every token position at every step -- even when many unmasked tokens are essentially fixed, resulting in substantial waste in compute. We propose SureLock: when the posterior at an unmasked position has stabilized across steps (our sure condition), we lock that position -- thereafter skipping its query projection and feed-forward sublayers -- while caching its attention keys and values so other positions can continue to attend to it. This reduces the dominant per-iteration computational cost from $O(N^2d)$ to $O(MNd)$ where $N$ is the sequence length, $M$ is the number of unlocked token positions, and $d$ is the model dimension. In practice, $M$ decreases as the iteration progresses, yielding substantial savings. On LLaDA-8B, SureLock reduces algorithmic FLOPs by 30--50% relative to the sa...