[2603.26330] Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation

[2603.26330] Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.26330: Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26330 (cs) [Submitted on 27 Mar 2026] Title:Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation Authors:Yiming Ren, Yujiu Yang, Junjie Wang View a PDF of the paper titled Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation, by Yiming Ren and 2 other authors View PDF HTML (experimental) Abstract:Supervised fine-tuning (SFT) on visual instruction data often improves perceptual capabilities in vision-language models (VLMs) while degrading reasoning performance, creating a persistent reasoning tax during post-training. We investigate whether this degradation is related to disrupted access to depth-wise representations, and find that even fixed cross-depth aggregation substantially restores reasoning, suggesting that preserved cross-depth access is an important missing factor in VLM fine-tuning. Building on this observation, we propose Input-Adaptive Depth Aggregation (IADA), a lightweight mechanism that makes cross-depth retrieval input-adaptive, modality-aware, and efficiently parameterized through a low-rank bottleneck. On Qwen3-VL-2B, IADA improves the average reasoning score by 9.5 points and the average perception score by $3.3$ points over LoRA-only fine-tuning with only 0.14M additional parameters, with the strongest gains appearing in parameter-efficient low-rank settings. Subjects: Computer Vision and Pattern...

Originally published on March 30, 2026. Curated by AI News.

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Anthropic leaks part of Claude Code's internal source code
Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min ·
Australian government and Anthropic sign MOU for AI safety and research
Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min ·
Penguin to sue OpenAI over ChatGPT version of German children’s book
Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime