[2604.08557] Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

[2604.08557] Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2604.08557: Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

Computer Science > Computation and Language arXiv:2604.08557 (cs) [Submitted on 17 Mar 2026] Title:Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models Authors:Arth Singh View a PDF of the paper titled Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models, by Arth Singh View PDF HTML (experimental) Abstract:Diffusion-based language models (dLLMs) generate text by iteratively denoising masked token sequences. We show that their safety alignment rests on a single fragile assumption: that the denoising schedule is monotonic and committed tokens are never re-evaluated. Safety-aligned dLLMs commit refusal tokens within the first 8-16 of 64 denoising steps, and the schedule treats these commitments as permanent. A trivial two-step intervention - re-masking these tokens and injecting a 12-token affirmative prefix - achieves 76.1% ASR on HarmBench (n=159, Lg=128) against LLaDA-8B-Instruct and 81.8% ASR (n=159) against Dream-7B-Instruct, without any gradient computation or adversarial search. The simplicity of this exploit is itself the central finding: augmenting with gradient-optimized perturbation via a differentiable Gumbel-softmax chain consistently degrades ASR (e.g., 41.5% vs. 76.1% at Lg=128), confirming that the vulnerability is structural rather than requiring sophisticated exploitation. These findings reveal that dLLM safety is not adversarially robust but architecturally shallow - it holds only because...

Originally published on April 13, 2026. Curated by AI News.

Related Articles

Llms

Transformer Math Explorer [P]

This is an interactive math reference for transformer models, presented via dataflow graphs, all the way down to elementary math. Covers ...

Reddit - Machine Learning · 1 min ·
Spotify wants to become the home for AI-generated personal audio | TechCrunch
Llms

Spotify wants to become the home for AI-generated personal audio | TechCrunch

Users will be able to create a podcast from Codex or Claude Code and import it to Spotify

TechCrunch - AI · 3 min ·
Llms

We built something ChatGPT doesn't do — AI that delivers results, not answers

Most AI gives you text. We built cards. Here's what I mean. When you ask LookMood Agent to find you a job, you don't get advice on where ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

(Posting Here because removed by Chatgpt Complaints moderators because the model here is 4o, and refuse to believe there were any safety ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime