[2511.15090] SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
About this article
Abstract page for arXiv paper 2511.15090: SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
Computer Science > Databases arXiv:2511.15090 (cs) [Submitted on 19 Nov 2025 (v1), last revised 30 Mar 2026 (this version, v2)] Title:SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning Authors:Wenhan Yu, Zhaoxi Zhang, Wang Chen, Guanqiang Qi, Weikang Li, Lei Sha, Deguo Xia, Jizhou Huang View a PDF of the paper titled SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning, by Wenhan Yu and 7 other authors View PDF HTML (experimental) Abstract:Scientific documents contain complex multimodal structures, which makes evidence localization and scientific reasoning in Document Visual Question Answering particularly challenging. However, most existing benchmarks evaluate models only at the page level without explicitly annotating the evidence regions that support the answer, which limits both interpretability and the reliability of evaluation. To address this limitation, we introduce SciEGQA, a scientific document question answering and reasoning dataset with semantic evidence grounding, where supporting evidence is represented as semantically coherent document regions annotated with bounding boxes. SciEGQA consists of two components: a **human-annotated fine-grained benchmark** containing 1,623 high-quality question--answer pairs, and a **large-scale automatically constructed training set** with over 30K QA pairs generated through an automated data construction pipeline. Extensive experiments on a wide range of Visio...