Machine Learning Ai Infrastructure Llms Ai Safety

[2602.17053] RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

arXiv - AI February 20, 2026 4 min read Article

Summary

The paper introduces RFEval, a benchmark for assessing reasoning faithfulness in large reasoning models, highlighting issues of unfaithfulness despite accuracy.

Why It Matters

As AI systems increasingly influence decision-making, ensuring the reliability of their reasoning processes is critical. This research provides a framework to evaluate and improve the trustworthiness of large reasoning models, emphasizing that accuracy alone is insufficient for reliable AI.

Key Takeaways

RFEval benchmarks reasoning faithfulness with 7,186 instances across seven tasks.
49.7% of outputs from evaluated models showed unfaithfulness, primarily due to stance inconsistency.
Accuracy does not reliably indicate reasoning faithfulness, necessitating new evaluation methods.
Failures are more common in specific domains like math and code, linked to post-training regimes.
Trustworthy AI requires optimizing both outcomes and the integrity of reasoning processes.

Computer Science > Artificial Intelligence arXiv:2602.17053 (cs) [Submitted on 19 Feb 2026] Title:RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models Authors:Yunseok Han, Yejoon Lee, Jaeyoung Do View a PDF of the paper titled RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models, by Yunseok Han and 2 other authors View PDF HTML (experimental) Abstract:Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined by two testable conditions: stance consistency (a coherent stance linking reasoning to answer) and causal influence (the stated reasoning causally drives the answer under output-level interventions), explicitly decoupled from accuracy. To operationalize this, we present RFEval, a benchmark of 7,186 instances across seven tasks that probes faithfulness via controlled, output-level counterfactual interventions. Evaluating twelve open-source LRMs, we find unfaithfulness in 49.7% of outputs, predominantly from stance inconsistency. Failures are concentrated in brittle, convergent domains such as math and code, and correlate more with post-training regimes than with scale: within-family ablations indicate that adding current RL-style objectives on top of s...

Read Original Article

[2602.17053] RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

Summary

Why It Matters

Key Takeaways

Related Articles

Top 10 AI certifications and courses for 2026

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

No comments

Stay updated with AI News