[2602.17053] RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

[2602.17053] RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

arXiv - AI 4 min read Article

Summary

The paper introduces RFEval, a benchmark for assessing reasoning faithfulness in large reasoning models, highlighting issues of unfaithfulness despite accuracy.

Why It Matters

As AI systems increasingly influence decision-making, ensuring the reliability of their reasoning processes is critical. This research provides a framework to evaluate and improve the trustworthiness of large reasoning models, emphasizing that accuracy alone is insufficient for reliable AI.

Key Takeaways

  • RFEval benchmarks reasoning faithfulness with 7,186 instances across seven tasks.
  • 49.7% of outputs from evaluated models showed unfaithfulness, primarily due to stance inconsistency.
  • Accuracy does not reliably indicate reasoning faithfulness, necessitating new evaluation methods.
  • Failures are more common in specific domains like math and code, linked to post-training regimes.
  • Trustworthy AI requires optimizing both outcomes and the integrity of reasoning processes.

Computer Science > Artificial Intelligence arXiv:2602.17053 (cs) [Submitted on 19 Feb 2026] Title:RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models Authors:Yunseok Han, Yejoon Lee, Jaeyoung Do View a PDF of the paper titled RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models, by Yunseok Han and 2 other authors View PDF HTML (experimental) Abstract:Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined by two testable conditions: stance consistency (a coherent stance linking reasoning to answer) and causal influence (the stated reasoning causally drives the answer under output-level interventions), explicitly decoupled from accuracy. To operationalize this, we present RFEval, a benchmark of 7,186 instances across seven tasks that probes faithfulness via controlled, output-level counterfactual interventions. Evaluating twelve open-source LRMs, we find unfaithfulness in 49.7% of outputs, predominantly from stance inconsistency. Failures are concentrated in brittle, convergent domains such as math and code, and correlate more with post-training regimes than with scale: within-family ablations indicate that adding current RL-style objectives on top of s...

Related Articles

Top 10 AI certifications and courses for 2026
Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min ·
[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
Llms

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Abstract page for arXiv paper 2604.01989: Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

arXiv - AI · 4 min ·
[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars
Machine Learning

[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Abstract page for arXiv paper 2604.01447: Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

arXiv - AI · 3 min ·
[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime