Machine Learning Generative Ai Ai Startups Ai Safety

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

arXiv - AI February 25, 2026 3 min read Article

Summary

This paper investigates the impact of encoder-side poisoning on text-to-image models, revealing that traditional evaluations of backdoor attacks are insufficient. It introduces a new framework, SEMAD, to quantify semantic drift and structural degradation caused by such attacks.

Why It Matters

Understanding the vulnerabilities of diffusion models to encoder attacks is crucial for improving AI safety. This research highlights the need for more comprehensive evaluations that go beyond simple trigger detection, thus informing future defenses against backdoor attacks.

Key Takeaways

Encoder-side poisoning can cause persistent semantic corruption in models.
Traditional evaluations of backdoor attacks are inadequate.
The SEMAD framework quantifies internal embedding drift and functional misalignment.
Backdoors act as low-rank deformations, amplifying local sensitivity.
Geometric audits are necessary to assess structural risks in AI models.

Computer Science > Cryptography and Security arXiv:2602.20193 (cs) [Submitted on 21 Feb 2026] Title:When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks Authors:Shenyang Chen, Liuwan Zhu View a PDF of the paper titled When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks, by Shenyang Chen and 1 other authors View PDF HTML (experimental) Abstract:Standard evaluations of backdoor attacks on text-to-image (T2I) models primarily measure trigger activation and visual fidelity. We challenge this paradigm, demonstrating that encoder-side poisoning induces persistent, trigger-free semantic corruption that fundamentally reshapes the representation manifold. We trace this vulnerability to a geometric mechanism: a Jacobian-based analysis reveals that backdoors act as low-rank, target-centered deformations that amplify local sensitivity, causing distortion to propagate coherently across semantic neighborhoods. To rigorously quantify this structural degradation, we introduce SEMAD (Semantic Alignment and Drift), a diagnostic framework that measures both internal embedding drift and downstream functional misalignment. Our findings, validated across diffusion and contrastive paradigms, expose the deep structural risks of encoder poisoning and highlight the necessity of geometric audits beyond simple attack success rates. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:...

Read Original Article