[2512.10766] Metaphor-based Jailbreak Attacks on Text-to-Image Models

arXiv - AI March 25, 2026 4 min read

About this article

Abstract page for arXiv paper 2512.10766: Metaphor-based Jailbreak Attacks on Text-to-Image Models

Computer Science > Cryptography and Security arXiv:2512.10766 (cs) [Submitted on 6 Dec 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Metaphor-based Jailbreak Attacks on Text-to-Image Models Authors:Chenyu Zhang, Lanjun Wang, Yiwen Ma, Wenhui Li, Yi Tu, An-An Liu View a PDF of the paper titled Metaphor-based Jailbreak Attacks on Text-to-Image Models, by Chenyu Zhang and 5 other authors View PDF HTML (experimental) Abstract:Text-to-image (T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreak attacks have shown that adversarial prompts can effectively bypass these mechanisms and induce T2I models to produce sensitive content, revealing critical safety vulnerabilities. However, existing attack methods implicitly assume that the attacker knows the type of deployed defenses, which limits their effectiveness against unknown or diverse defense mechanisms. In this work, we reveal an underexplored vulnerability of T2I models to metaphor-based jailbreak attacks (MJA), which aims to attack diverse defense mechanisms without prior knowledge of their type by generating metaphor-based adversarial prompts. Specifically, MJA consists of two modules: an LLM-based multi-agent generation module (LMAG) and an adversarial prompt optimization module (APO). LMAG decomposes the generation of metaphor-based adversarial prompts into three subtasks: metaphor retrieval, context matching, and adversarial prompt...

Originally published on March 25, 2026. Curated by AI News.

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · 11 minutes ago

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 5 hours ago

Machine Learning

[D] Looking for definition of open-world ish learning problem

Hello! Recently I did a project where I initially had around 30 target classes. But at inference, the model had to be able to handle a lo...

Reddit - Machine Learning · 1 min · about 5 hours ago

[2512.10766] Metaphor-based Jailbreak Attacks on Text-to-Image Models

About this article

Related Articles

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

UMKC Announces New Master of Science in Artificial Intelligence

[D] Looking for definition of open-world ish learning problem

No comments

Stay updated with AI News