[2601.14289] RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
About this article
Abstract page for arXiv paper 2601.14289: RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
Computer Science > Computation and Language arXiv:2601.14289 (cs) [Submitted on 14 Jan 2026 (v1), last revised 30 Apr 2026 (this version, v2)] Title:RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension Authors:Yelin Chen, Fanjin Zhang, Suping Sun, Yunhe Pang, Yuanchun Wang, Jian Song, Xiaoyan Li, Lei Hou, Shu Zhao, Jie Tang, Juanzi Li View a PDF of the paper titled RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension, by Yelin Chen and 10 other authors View PDF HTML (experimental) Abstract:Understanding research papers remains challenging for foundation models due to specialized scientific discourse and complex figures and tables, yet existing benchmarks offer limited fine-grained evaluation at scale. To address this gap, we introduce RPC-Bench, a large-scale question-answering benchmark built from review-rebuttal exchanges of high-quality computer science papers, containing 15K human-verified QA pairs. We design a fine-grained taxonomy aligned with the scientific research flow to assess models' ability to understand and answer why, what, and how questions in scholarly contexts. We also define an elaborate LLM-human interaction annotation framework to support large-scale labeling and quality control. Following the LLM-as-a-Judge paradigm, we develop a scalable framework that evaluates models on correctness-completeness and conciseness, with high agreement to human judgment. Experiments reveal that even the strongest models (GPT-5) achieve on...