[2510.02999] Untargeted Jailbreak Attack
About this article
Abstract page for arXiv paper 2510.02999: Untargeted Jailbreak Attack
Computer Science > Cryptography and Security arXiv:2510.02999 (cs) [Submitted on 3 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v4)] Title:Untargeted Jailbreak Attack Authors:Xinzhe Huang, Wenjing Hu, Tianhang Zheng, Kedong Xiu, Xiaojun Jia, Di Wang, Zhan Qin, Kui Ren View a PDF of the paper titled Untargeted Jailbreak Attack, by Xinzhe Huang and 7 other authors View PDF HTML (experimental) Abstract:Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with predefined target responses. However, restricting the objective as inducing fixed targets inherently constrains the adversarial search space, limiting the overall attack efficacy. Furthermore, existing methods typically require numerous optimization iterations to fulfill the large gap between the fixed target and the original LLM output, resulting in low attack efficiency. To overcome these limitations, we propose the first gradient-based untargeted jailbreak attack (UJA), which relies on an untargeted objective to maximize the unsafety probability of the LLM output, without enforcing any response patterns. For tractable optimization, we further decompose this objective into two differentiable sub-objectives to search the optimal harmful response and the corresponding adversarial prompt, with a theoretical analysis to validate the decomposition. In contrast to existing attacks, UJA's unrestricted objective significantly expands ...