[2506.09082] AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
About this article
Abstract page for arXiv paper 2506.09082: AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2506.09082 (cs) [Submitted on 10 Jun 2025 (v1), last revised 31 Mar 2026 (this version, v4)] Title:AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models Authors:Zheda Mai, Arpita Chowdhury, Zihe Wang, Sooyoung Jeon, Lemeng Wang, Jiacheng Hou, Wei-Lun Chao View a PDF of the paper titled AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models, by Zheda Mai and 6 other authors View PDF HTML (experimental) Abstract:The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align with VQA test distributions, meaning a wrong prediction can stem from such data mismatch rather than a VFM' visual shortcomings; (ii) VQA benchmarks often require multiple visual abilities, making it hard to tell whether errors stem from lacking all required abilities or just a single critical one. To address these gaps, we introduce AVA-Bench, the first benchmark that explicitly disentangles 14 Atomic Visual Abilities (AVAs) -- foundational skills like localization, depth estimation, and spatial understanding that collectively support complex visual reasoning tasks. By decoupling AVAs and matching training and test distributions within ea...