[2410.22492] RealCQA-V2: A Diagnostic Benchmark for Structured Visual Entailment over Scientific Charts
About this article
Abstract page for arXiv paper 2410.22492: RealCQA-V2: A Diagnostic Benchmark for Structured Visual Entailment over Scientific Charts
Computer Science > Artificial Intelligence arXiv:2410.22492 (cs) [Submitted on 29 Oct 2024 (v1), last revised 24 Mar 2026 (this version, v3)] Title:RealCQA-V2: A Diagnostic Benchmark for Structured Visual Entailment over Scientific Charts Authors:Saleem Ahmed, Srirangaraj Setlur, Venu Govindaraju View a PDF of the paper titled RealCQA-V2: A Diagnostic Benchmark for Structured Visual Entailment over Scientific Charts, by Saleem Ahmed and 2 other authors View PDF HTML (experimental) Abstract:Multimodal reasoning models often produce fluent answers supported by seemingly coherent rationales. Existing benchmarks evaluate only final-answer correctness. They do not support atomic visual entailment verification of intermediate steps, especially visual compositional logic. This limitation is especially acute in scientific chart understanding, where answers depend on deterministically grounded visual semantics such as axes, legends, and quantitative relations. We introduce RealCQA-V2, a large-scale benchmark that reformulates chart question answering as Visual Premise Proving (VPP): a structured logical entailment task over chart-grounded visual predicates. Each question is deconstructed into manually curated, atomic premises grounded in chart elements (axes, legends, marks, and quantitative relations), yielding executable reasoning chains rather than free-form textual rationales. These premises form compositional reasoning chains, enabling verification at the level of individual v...