[2604.09253] Mosaic: Multimodal Jailbreak against Closed-Source VLMs

[2604.09253] Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization

arXiv - AI April 13, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.09253: Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.09253 (cs) [Submitted on 10 Apr 2026] Title:Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization Authors:Yuqin Lan, Gen Li, Yuanze Hu, Weihao Shen, Zhaoxin Fan, Faguo Wu, Xiao Zhang, Laurence T. Yang, Zhiming Zheng View a PDF of the paper titled Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization, by Yuqin Lan and 8 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) are powerful but remain vulnerable to multimodal jailbreak attacks. Existing attacks mainly rely on either explicit visual prompt attacks or gradient-based adversarial optimization. While the former is easier to detect, the latter produces subtle perturbations that are less perceptible, but is usually optimized and evaluated under homogeneous open-source surrogate-target settings, leaving its effectiveness on commercial closed-source VLMs under heterogeneous settings unclear. To examine this issue, we study different surrogate-target settings and observe a consistent gap between homogeneous and heterogeneous settings, a phenomenon we term surrogate dependency. Motivated by this finding, we propose Mosaic, a Multi-view ensemble optimization framework for multimodal jailbreak against closed-source VLMs, which alleviates surrogate dependency under heterogeneous surrogate-target settings by reducing over-reliance on any single surrogate...

Originally published on April 13, 2026. Curated by AI News.

Llms

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

(Posting Here because removed by Chatgpt Complaints moderators because the model here is 4o, and refuse to believe there were any safety ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

We built a way for two people's AI context to talk to each other (without sharing their conversations)

We've been thinking about how we use AI in our relationships. Big part of it is about other people. Talking about them, figuring out what...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

No flattery please, Claude: I’m British | Brief letters

AI Tools & Products · 2 min · about 2 hours ago

Llms

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

This article discusses the resolution of an AI mystery regarding ChatGPT's unusual focus on gremlins and goblins, along with insights gai...

AI Tools & Products · 1 min · about 2 hours ago

[2604.09253] Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization

About this article

Related Articles

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

We built a way for two people's AI context to talk to each other (without sharing their conversations)

No flattery please, Claude: I’m British | Brief letters

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

No comments

Stay updated with AI News