[2603.01195] VisNec: Measuring and Leveraging Visual Necessity for

[2603.01195] VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.01195: VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.01195 (cs) [Submitted on 1 Mar 2026] Title:VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning Authors:Mingkang Dong, Hongyi Cai, Jie Li, Sifan Zhou, Bin Ren, Kunyu Peng, Yuqian Fu View a PDF of the paper titled VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning, by Mingkang Dong and 6 other authors View PDF HTML (experimental) Abstract:The effectiveness of multimodal instruction tuning depends not only on dataset scale, but critically on whether training samples genuinely require visual reasoning. However, existing instruction datasets often contain a substantial portion of visually redundant samples (solvable from text alone), as well as multimodally misaligned supervision that can degrade learning. To address this, we propose VisNec (Visual Necessity Score), a principled data selection framework that measures the marginal contribution of visual input during instruction tuning. By comparing predictive loss with and without visual context, VisNec identifies whether a training instance is vision-critical, redundant, or misaligned. To preserve task diversity, we combine VisNec with semantic clustering and select high-necessity samples within each cluster. Across 10 downstream benchmarks, training on only 15% of the LLaVA-665K dataset selected by VisNec achieves 100.2% of full-data performance. On the smaller Vision-Flan-186K dataset, our sel...

Originally published on March 03, 2026. Curated by AI News.

Machine Learning

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

The startup, which is planning to go public later this year, designs chips specifically for AI inference, another challenger to Nvidia's ...

TechCrunch - AI · 4 min · about 2 hours ago

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Machine Learning

Big increase in the amount of people using AI to write their replies with AI

I find it interesting that we’ve all randomly decided to use the “-“ more often recently on reddit, and everyone’s grammar has drasticall...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min · about 7 hours ago

[2603.01195] VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning

About this article

Related Articles

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Big increase in the amount of people using AI to write their replies with AI

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

No comments

Stay updated with AI News