[2506.06905] Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
About this article
Abstract page for arXiv paper 2506.06905: Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Computer Science > Artificial Intelligence arXiv:2506.06905 (cs) [Submitted on 7 Jun 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering Authors:Akash Gupta, Amos Storkey, Mirella Lapata View a PDF of the paper titled Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering, by Akash Gupta and 2 other authors View PDF Abstract:Large Multimodal Models (LMMs) often rely on in-context learning (ICL) to perform new visual question answering (VQA) tasks with minimal supervision. However, ICL performance, especially in smaller LMMs, does not always improve monotonically when increasing the number of examples. We hypothesize that this happens because the LMM is overwhelmed by extraneous information in the image embeddings that is irrelevant to the downstream task. To address this, we propose a meta-learning approach that induces few-shot capabilities in LMMs through a fixed set of soft prompts distilled from task-relevant visual features, which are adapted at test time using a small number of examples. We facilitate this distillation through an attention-mapper module that can be easily integrated with any LMM architecture and is jointly learned with soft prompts. Evaluation on the VL-ICL Bench shows that our method successfully achieves task adaptation in low-data regimes with just a few gradient steps, outperforming ICL by 21.2%. Comparisons with parameter-efficient finetuning meth...