[2509.24296] DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
About this article
Abstract page for arXiv paper 2509.24296: DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Computer Science > Computation and Language arXiv:2509.24296 (cs) [Submitted on 29 Sep 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models Authors:Zherui Li, Zheng Nie, Zhenhong Zhou, Yue Liu, Yitong Zhang, Yu Cheng, Qingsong Wen, Kun Wang, Yufei Guo, Jiaheng Zhang View a PDF of the paper titled DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models, by Zherui Li and 9 other authors View PDF HTML (experimental) Abstract:The rapid advancement of Diffusion Large Language Models (dLLMs) introduces unprecedented vulnerabilities that are fundamentally distinct from Autoregressive LLMs, stemming from their iterative and parallel generation mechanisms. In this paper, we conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics. Experimental results reveal a harmful bias inherent in the standard greedy remasking strategy and identify a critical phenomenon we term Denoising-path Dependence, where the safety of early-stage tokens decisively influences the final output. These findings also indicate that while current decoding strategies constitute a significant vulnerability, dLLMs possess a substantial intrinsic safety potential. To unlock this potential, we propose DiffuGuard, a training-free defense framework that addresses vulnerabilities through a dual-stage approach: Stoc...