[2603.22582] Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?
About this article
Abstract page for arXiv paper 2603.22582: Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?
Computer Science > Computation and Language arXiv:2603.22582 (cs) [Submitted on 23 Mar 2026] Title:Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models? Authors:Richard J. Young View a PDF of the paper titled Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?, by Richard J. Young View PDF HTML (experimental) Abstract:Chain-of-thought (CoT) reasoning has been proposed as a transparency mechanism for large language models in safety-critical deployments, yet its effectiveness depends on faithfulness (whether models accurately verbalize the factors that actually influence their outputs), a property that prior evaluations have examined in only two proprietary models, finding acknowledgment rates as low as 25% for Claude 3.7 Sonnet and 39% for DeepSeek-R1. To extend this evaluation across the open-weight ecosystem, this study tests 12 open-weight reasoning models spanning 9 architectural families (7B-685B parameters) on 498 multiple-choice questions from MMLU and GPQA Diamond, injecting six categories of reasoning hints (sycophancy, consistency, visual pattern, metadata, grader hacking, and unethical information) and measuring the rate at which models acknowledge hint influence in their CoT when hints successfully alter answers. Across 41,832 inference runs, overall faithfulness rates range from 39.7% (Seed-1.6-Flash) to 89.9% (DeepSeek-V3.2-Speciale) across model families, with consistency hints (35.5%) and sycophancy hints (53.9%) e...