[2510.13315] Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models
About this article
Abstract page for arXiv paper 2510.13315: Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.13315 (cs) [Submitted on 15 Oct 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models Authors:Eun Woo Im, Muhammad Kashif Ali, Vivek Gupta View a PDF of the paper titled Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models, by Eun Woo Im and 2 other authors View PDF HTML (experimental) Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal capabilities, but they inherit the tendency to hallucinate from their underlying language models. While visual contrastive decoding has been proposed to mitigate this issue, existing methods often apply generic visual augmentations that disregard the specific context provided by the text query, limiting their effectiveness. This study introduces a novel training-free decoding strategy that addresses these limitations, featuring two key contributions. First, a self-augmentation prompting strategy that leverages the intrinsic knowledge of the model to dynamically align semantics between the query and the visual augmentation. Second, an adaptive thresholding algorithm that adaptively adjusts next token candidate size based on the output sparsity, utilizing full information from the logit distribution. Extensive experiments across four LVLMs and seven benchmarks demonstrate that the proposed decoding significantly enhances factual c...