[2603.23507] Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
About this article
Abstract page for arXiv paper 2603.23507: Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
Computer Science > Computation and Language arXiv:2603.23507 (cs) [Submitted on 4 Mar 2026] Title:Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes Authors:Fangyu Ding, Ding Ding, Sijin Chen, Kaibo Wang, Peng Xu, Zijin Feng, Haoli Bai, Kai Han, Youliang Yan, Binhang Yuan, Jiacheng Sun View a PDF of the paper titled Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes, by Fangyu Ding and 10 other authors View PDF HTML (experimental) Abstract:While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this paper, we propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs. DID improves training and inference efficiency by eliminating two major sources of computational overhead in MDLMs: the computations on non-informative 1) <MASK> tokens inherent to the paradigm, and 2) <PAD> tokens introduced in variable-length settings. Furthermore, DID offers greater flexibility by: 1) natively supporting variable-length sequences without requiring fixed-length padding, and 2) an intrinsic self-correction mechanism during generation due to insertion that dynamically adjusts token positio...