Llms Machine Learning Ai Safety

[2602.17645] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

arXiv - AI February 20, 2026 4 min read Article

Summary

This paper presents M-Attack-V2, an advanced method for executing black-box attacks on Large Vision-Language Models (LVLMs) by improving gradient stability and transferability through fine-grained detail targeting.

Why It Matters

As AI models become more complex, understanding vulnerabilities is crucial for enhancing AI safety. This research addresses significant challenges in executing effective adversarial attacks on LVLMs, which are increasingly used in various applications. The findings could influence future security measures and model designs.

Key Takeaways

M-Attack-V2 improves black-box attack success rates on LVLMs significantly.
The method addresses gradient instability by averaging multiple local views.
Auxiliary Target Alignment enhances target manifold smoothness, reducing variance.
The research highlights the importance of fine-grained detail targeting in adversarial attacks.
Code and data are publicly available, promoting transparency and further research.

Computer Science > Machine Learning arXiv:2602.17645 (cs) [Submitted on 19 Feb 2026] Title:Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting Authors:Xiaohan Zhao, Zhaoyi Li, Yaxin Luo, Jiacheng Cui, Zhiqiang Shen View a PDF of the paper titled Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting, by Xiaohan Zhao and Zhaoyi Li and Yaxin Luo and Jiacheng Cui and Zhiqiang Shen View PDF HTML (experimental) Abstract:Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set f...

Read Original Article