[2407.17491] Robust Adaptation of Foundation Models with Black-Box Visual Prompting
About this article
Abstract page for arXiv paper 2407.17491: Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Computer Science > Computer Vision and Pattern Recognition arXiv:2407.17491 (cs) [Submitted on 4 Jul 2024 (v1), last revised 1 Apr 2026 (this version, v3)] Title:Robust Adaptation of Foundation Models with Black-Box Visual Prompting Authors:Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song View a PDF of the paper titled Robust Adaptation of Foundation Models with Black-Box Visual Prompting, by Changdae Oh and 6 other authors View PDF HTML (experimental) Abstract:With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic assumptions: 1) full access to the parameters of a PTM, and 2) sufficient memory capacity to cache all intermediate activations for gradient computation. However, in most real-world applications, PTMs serve as black-box APIs or proprietary software without full parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. This work proposes black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge of their architectures or parameters. BlackVIP has two components: 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent visual prompts, which allow the target PTM to adapt in the wild. SPSA-GC efficiently estimates th...