[2603.29410] AGFT: Alignment-Guided Fine-Tuning for Zero-Shot

[2603.29410] AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

arXiv - AI April 01, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.29410: AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.29410 (cs) [Submitted on 31 Mar 2026] Title:AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models Authors:Yubo Cui, Xianchao Guan, Zijun Xiong, Zheng Zhang View a PDF of the paper titled AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models, by Yubo Cui and 3 other authors View PDF HTML (experimental) Abstract:Pre-trained vision-language models (VLMs) exhibit strong zero-shot generalization but remain vulnerable to adversarial perturbations. Existing classification-guided adversarial fine-tuning methods often disrupt pre-trained cross-modal alignment, weakening visual-textual correspondence and degrading zero-shot performance. In this paper, we propose an Alignment-Guided Fine-Tuning (AGFT) framework that enhances zero-shot adversarial robustness while preserving the cross-modal semantic structure. Unlike label-based methods that rely on hard labels and fail to maintain the relative relationships between image and text, AGFT leverages the probabilistic predictions of the original model for text-guided adversarial training, which aligns adversarial visual features with textual embeddings via soft alignment distributions, improving zero-shot adversarial robustness. To address structural discrepancies introduced by fine-tuning, we introduce a distribution consistency calibration mechanism that adjusts the robust model ou...

Originally published on April 01, 2026. Curated by AI News.

Llms

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

I am a solo developer who has been using all three seriously. Here is what I actually think: GPT-4o — Strengths: Large context window, st...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

You're giving feedback on a new version of ChatGPT

So I will be paying attention to these system messages more now- the last time I got one of these not so long back the 'tone' changed to ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

Claude vs Gemini: Solving the laden knight's tour problem

AI Coding contest day 8 The eighth challenge is a weighted variant of the classic knight's tour. The knight must visit every square of a ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

[2603.29410] AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

About this article

Related Articles

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

You're giving feedback on a new version of ChatGPT

Gemma 4 actually running usable on an Android phone (not llama.cpp)

Claude vs Gemini: Solving the laden knight's tour problem

No comments

Stay updated with AI News