[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

arXiv - AI 3 min read Article

Summary

This paper investigates the impact of encoder-side poisoning on text-to-image models, revealing that traditional evaluations of backdoor attacks are insufficient. It introduces a new framework, SEMAD, to quantify semantic drift and structural degradation caused by such attacks.

Why It Matters

Understanding the vulnerabilities of diffusion models to encoder attacks is crucial for improving AI safety. This research highlights the need for more comprehensive evaluations that go beyond simple trigger detection, thus informing future defenses against backdoor attacks.

Key Takeaways

  • Encoder-side poisoning can cause persistent semantic corruption in models.
  • Traditional evaluations of backdoor attacks are inadequate.
  • The SEMAD framework quantifies internal embedding drift and functional misalignment.
  • Backdoors act as low-rank deformations, amplifying local sensitivity.
  • Geometric audits are necessary to assess structural risks in AI models.

Computer Science > Cryptography and Security arXiv:2602.20193 (cs) [Submitted on 21 Feb 2026] Title:When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks Authors:Shenyang Chen, Liuwan Zhu View a PDF of the paper titled When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks, by Shenyang Chen and 1 other authors View PDF HTML (experimental) Abstract:Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this paradigm, demonstrating that encoder-side poisoning induces persistent, trigger-free semantic corruption that fundamentally reshapes the representation manifold. We trace this vulnerability to a geometric mechanism: a Jacobian-based analysis reveals that backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods. To rigorously quantify this structural degradation, we introduce SEMAD (Semantic Alignment and Drift), a diagnostic framework that measures both internal embedding drift and downstream functional misalignment. Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime