[2512.10766] Metaphor-based Jailbreak Attacks on Text-to-Image Models
About this article
Abstract page for arXiv paper 2512.10766: Metaphor-based Jailbreak Attacks on Text-to-Image Models
Computer Science > Cryptography and Security arXiv:2512.10766 (cs) [Submitted on 6 Dec 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Metaphor-based Jailbreak Attacks on Text-to-Image Models Authors:Chenyu Zhang, Lanjun Wang, Yiwen Ma, Wenhui Li, Yi Tu, An-An Liu View a PDF of the paper titled Metaphor-based Jailbreak Attacks on Text-to-Image Models, by Chenyu Zhang and 5 other authors View PDF HTML (experimental) Abstract:Text-to-image (T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreak attacks have shown that adversarial prompts can effectively bypass these mechanisms and induce T2I models to produce sensitive content, revealing critical safety vulnerabilities. However, existing attack methods implicitly assume that the attacker knows the type of deployed defenses, which limits their effectiveness against unknown or diverse defense mechanisms. In this work, we reveal an underexplored vulnerability of T2I models to metaphor-based jailbreak attacks (MJA), which aims to attack diverse defense mechanisms without prior knowledge of their type by generating metaphor-based adversarial prompts. Specifically, MJA consists of two modules: an LLM-based multi-agent generation module (LMAG) and an adversarial prompt optimization module (APO). LMAG decomposes the generation of metaphor-based adversarial prompts into three subtasks: metaphor retrieval, context matching, and adversarial prompt...