[2511.06899] RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation
Summary
The paper presents the Reasoning Process Tree Score (RPTS), a novel metric for evaluating reasoning in Large Vision-Language Models (LVLMs) by assessing the reasoning process rather than just the final answers.
Why It Matters
This research addresses a critical gap in multimodal evaluation by focusing on the reasoning process, which is often overlooked in existing benchmarks. By introducing RPTS and the RPTS-Eval benchmark, the authors aim to enhance the understanding of how LVLMs reason, potentially leading to more robust AI models.
Key Takeaways
- RPTS evaluates reasoning processes in LVLMs, not just final answers.
- The new RPTS-Eval benchmark includes 374 images and 390 reasoning instances.
- The study identifies limitations in current LVLMs and highlights differences in performance between open-source and commercial models.
- RPTS uses a tree structure to assign weighted scores to reasoning steps, improving evaluation accuracy.
- Intermodal relationships are considered to understand their impact on reasoning.
Computer Science > Computation and Language arXiv:2511.06899 (cs) [Submitted on 10 Nov 2025 (v1), last revised 25 Feb 2026 (this version, v3)] Title:RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation Authors:Haofeng Wang, Yu Zhang View a PDF of the paper titled RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation, by Haofeng Wang and Yu Zhang View PDF HTML (experimental) Abstract:Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks. However, most of these benchmarks evaluate models primarily through multiple-choice or short-answer formats, which do not take the reasoning process into account. Although some benchmarks assess the reasoning process, their methods are often overly simplistic and only examine reasoning when answers are incorrect. This approach overlooks scenarios where flawed reasoning leads to correct answers. In addition, these benchmarks do not consider the impact of intermodal relationships on reasoning. To address this issue, we propose the Reasoning Process Tree Score (RPTS), a tree structure-based metric to assess reasoning processes. Specifically, we organize the reasoning steps into a reasoning tree and leverage its hierarchical information to assign weighted faithfulness scores to each reasoning step. By dynamically adjusting these weights, RPTS not only evaluates the overall correctness of the reasoning, b...