[2602.18905] TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning
Summary
The paper presents the Trustworthy Unified Explanation Framework (TRUE) for enhancing the interpretability of large language models (LLMs) by integrating various verification and analysis methods.
Why It Matters
As LLMs become increasingly prevalent in decision-making processes, understanding their reasoning is crucial for trust and reliability. TRUE addresses the shortcomings of existing explanation methods, providing a structured approach to enhance transparency and accountability in AI systems.
Key Takeaways
- TRUE integrates executable reasoning verification and causal analysis for LLMs.
- The framework provides multi-level explanations, enhancing interpretability.
- It addresses limitations of existing methods by focusing on reasoning stability and failure mechanisms.
- Extensive experiments demonstrate the framework's effectiveness across various benchmarks.
- TRUE establishes a principled paradigm for reliable AI reasoning systems.
Computer Science > Machine Learning arXiv:2602.18905 (cs) [Submitted on 21 Feb 2026] Title:TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning Authors:Yujiao Yang View a PDF of the paper titled TRUE: A Trustworthy Unified Explanation Framework for Large Language Model Reasoning, by Yujiao Yang View PDF HTML (experimental) Abstract:Large language models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, yet their decision-making processes remain difficult to interpret. Existing explanation methods often lack trustworthy structural insight and are limited to single-instance analysis, failing to reveal reasoning stability and systematic failure mechanisms. To address these limitations, we propose the Trustworthy Unified Explanation Framework (TRUE), which integrates executable reasoning verification, feasible-region directed acyclic graph (DAG) modeling, and causal failure mode analysis. At the instance level, we redefine reasoning traces as executable process specifications and introduce blind execution verification to assess operational validity. At the local structural level, we construct feasible-region DAGs via structure-consistent perturbations, enabling explicit characterization of reasoning stability and the executable region in the local input space. At the class level, we introduce a causal failure mode analysis method that identifies recurring structural failure patterns and quantifies their causal influence us...