[2602.13904] Diagnosing Pathological Chain-of-Thought in Reasoning Models
Summary
This paper discusses the identification and diagnosis of pathological chain-of-thought reasoning in AI models, highlighting three specific failure modes and proposing metrics for assessment.
Why It Matters
Understanding the pathologies in chain-of-thought reasoning is crucial for improving AI safety and reliability. This research provides a framework for monitoring and enhancing reasoning capabilities in models, which is essential as AI systems become more integrated into decision-making processes.
Key Takeaways
- Identifies three pathologies in chain-of-thought reasoning: post-hoc rationalization, encoded reasoning, and internalized reasoning.
- Proposes simple, computationally inexpensive metrics for diagnosing these pathologies.
- Develops model organisms to validate the proposed metrics and their effectiveness in assessing CoT reasoning.
Computer Science > Artificial Intelligence arXiv:2602.13904 (cs) [Submitted on 14 Feb 2026] Title:Diagnosing Pathological Chain-of-Thought in Reasoning Models Authors:Manqing Liu, David Williams-King, Ida Caspary, Linh Le, Hannes Whittingham, Puria Radmard, Cameron Tice, Edward James Young View a PDF of the paper titled Diagnosing Pathological Chain-of-Thought in Reasoning Models, by Manqing Liu and 7 other authors View PDF HTML (experimental) Abstract:Chain-of-thought (CoT) reasoning is fundamental to modern LLM architectures and represents a critical intervention point for AI safety. However, CoT reasoning may exhibit failure modes that we note as pathologies, which prevent it from being useful for monitoring. Prior work has identified three distinct pathologies: post-hoc rationalization, where models generate plausible explanations backwards from predetermined answers; encoded reasoning, where intermediate steps conceal information within seemingly interpretable text; and internalized reasoning, where models replace explicit reasoning with meaningless filler tokens while computing internally. To better understand and discriminate between these pathologies, we create a set of concrete metrics that are simple to implement, computationally inexpensive, and task-agnostic. To validate our approach, we develop model organisms deliberately trained to exhibit specific CoT pathologies. Our work provides a practical toolkit for assessing CoT pathologies, with direct implications ...