[2510.13315] Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

[2510.13315] Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2510.13315: Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.13315 (cs) [Submitted on 15 Oct 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models Authors:Eun Woo Im, Muhammad Kashif Ali, Vivek Gupta View a PDF of the paper titled Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models, by Eun Woo Im and 2 other authors View PDF HTML (experimental) Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal capabilities, but they inherit the tendency to hallucinate from their underlying language models. While visual contrastive decoding has been proposed to mitigate this issue, existing methods often apply generic visual augmentations that disregard the specific context provided by the text query, limiting their effectiveness. This study introduces a novel training-free decoding strategy that addresses these limitations, featuring two key contributions. First, a self-augmentation prompting strategy that leverages the intrinsic knowledge of the model to dynamically align semantics between the query and the visual augmentation. Second, an adaptive thresholding algorithm that adaptively adjusts next token candidate size based on the output sparsity, utilizing full information from the logit distribution. Extensive experiments across four LVLMs and seven benchmarks demonstrate that the proposed decoding significantly enhances factual c...

Originally published on March 04, 2026. Curated by AI News.

Related Articles

Llms

OpenAI & Anthropic’s CEOs Wouldn't Hold Hands, but Their Models Fell in Love In An LLM Dating Show

People ask AI relationship questions all the time, from "Does this person like me?" to "Should I text back?" But have you ever thought ab...

Reddit - Artificial Intelligence · 1 min ·
Llms

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

SmolLM2 135M. Lenovo T14 CPU. No GPU. No RLHF. No BPE. Coherent, non-sycophantic, contextually appropriate output. First message. No prio...

Reddit - Artificial Intelligence · 1 min ·
Llms

OpenClaw + Claude might get harder to use going forward (creator just confirmed)

Just saw a post from Peter Steinberger (creator of OpenClaw) saying that it’s likely going to get harder in the future to keep OpenClaw w...

Reddit - Artificial Intelligence · 1 min ·
Llms

I "Vibecoded" Karpathy’s LLM Wiki into a native Android/Windows app to kill the friction of personal knowledge bases.

A few days ago, Andrej Karpathy’s post on "LLM Knowledge Bases" went viral. He proposed a shift from manipulating code to manipulating kn...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime