[2512.16523] TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models
About this article
Abstract page for arXiv paper 2512.16523: TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2512.16523 (cs) [Submitted on 18 Dec 2025 (v1), last revised 23 Mar 2026 (this version, v2)] Title:TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models Authors:Zhiwei Li, Yitian Pang, Weining Wang, Zhenan Sun, Qi Li View a PDF of the paper titled TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models, by Zhiwei Li and 4 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs), such as CLIP, have achieved impressive zero-shot recognition performance but remain highly susceptible to adversarial perturbations, posing significant risks in safety-critical scenarios. Previous training-time defenses rely on adversarial fine-tuning, which requires labeled data and costly retraining, while existing test-time strategies fail to reliably distinguish between clean and adversarial inputs, thereby preventing both adversarial robustness and clean accuracy from reaching their optimum. To address these limitations, we propose Test-Time Padding (TTP), a lightweight defense framework that performs adversarial detection followed by targeted adaptation at inference. TTP identifies adversarial inputs via the cosine similarity shift between CLIP feature embeddings computed before and after spatial padding, yielding a universal threshold for reliable detection across architectures and datasets. For detected adversarial case...