[2510.13315] Self-Aug: Query and Entropy Adaptive Decoding for Large

[2510.13315] Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

arXiv - AI March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.13315: Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.13315 (cs) [Submitted on 15 Oct 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models Authors:Eun Woo Im, Muhammad Kashif Ali, Vivek Gupta View a PDF of the paper titled Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models, by Eun Woo Im and 2 other authors View PDF HTML (experimental) Abstract:Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal capabilities, but they inherit the tendency to hallucinate from their underlying language models. While visual contrastive decoding has been proposed to mitigate this issue, existing methods often apply generic visual augmentations that disregard the specific context provided by the text query, limiting their effectiveness. This study introduces a novel training-free decoding strategy that addresses these limitations, featuring two key contributions. First, a self-augmentation prompting strategy that leverages the intrinsic knowledge of the model to dynamically align semantics between the query and the visual augmentation. Second, an adaptive thresholding algorithm that adaptively adjusts next token candidate size based on the output sparsity, utilizing full information from the logit distribution. Extensive experiments across four LVLMs and seven benchmarks demonstrate that the proposed decoding significantly enhances factual c...

Originally published on March 04, 2026. Curated by AI News.

Llms

OpenAI & Anthropic’s CEOs Wouldn't Hold Hands, but Their Models Fell in Love In An LLM Dating Show

People ask AI relationship questions all the time, from "Does this person like me?" to "Should I text back?" But have you ever thought ab...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

SmolLM2 135M. Lenovo T14 CPU. No GPU. No RLHF. No BPE. Coherent, non-sycophantic, contextually appropriate output. First message. No prio...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

OpenClaw + Claude might get harder to use going forward (creator just confirmed)

Just saw a post from Peter Steinberger (creator of OpenClaw) saying that it’s likely going to get harder in the future to keep OpenClaw w...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

I "Vibecoded" Karpathy’s LLM Wiki into a native Android/Windows app to kill the friction of personal knowledge bases.

A few days ago, Andrej Karpathy’s post on "LLM Knowledge Bases" went viral. He proposed a shift from manipulating code to manipulating kn...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

[2510.13315] Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models

About this article

Related Articles

OpenAI & Anthropic’s CEOs Wouldn't Hold Hands, but Their Models Fell in Love In An LLM Dating Show

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

OpenClaw + Claude might get harder to use going forward (creator just confirmed)

I "Vibecoded" Karpathy’s LLM Wiki into a native Android/Windows app to kill the friction of personal knowledge bases.

No comments

Stay updated with AI News