[2506.22740] Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
Summary
The paper presents a decision-theoretic framework for evaluating explanations in AI, emphasizing their role as information signals that improve decision-making tasks.
Why It Matters
Understanding how to evaluate explanations in AI is crucial for enhancing human-AI collaboration. This framework provides a structured approach to assess the effectiveness of explanations, which can lead to better decision support systems and improved interpretability in AI applications.
Key Takeaways
- Introduces a decision-theoretic framework for evaluating explanations.
- Defines three estimands for assessing explanation value: theoretical benchmark, human-complementary value, and behavioral value.
- Demonstrates practical application in human-AI decision support and mechanistic interpretability.
Computer Science > Artificial Intelligence arXiv:2506.22740 (cs) [Submitted on 28 Jun 2025 (v1), last revised 22 Feb 2026 (this version, v3)] Title:Explanations are a Means to an End: Decision Theoretic Explanation Evaluation Authors:Ziyang Guo, Berk Ustun, Jessica Hullman View a PDF of the paper titled Explanations are a Means to an End: Decision Theoretic Explanation Evaluation, by Ziyang Guo and 2 other authors View PDF HTML (experimental) Abstract:Explanations of model behavior are commonly evaluated via proxy properties weakly tied to the purposes explanations serve in practice. We contribute a decision theoretic framework that treats explanations as information signals valued by the expected improvement they enable on a specified decision task. This approach yields three distinct estimands: 1) a theoretical benchmark that upperbounds achievable performance by any agent with the explanation, 2) a human-complementary value that quantifies the theoretically attainable value that is not already captured by a baseline human decision policy, and 3) a behavioral value representing the causal effect of providing the explanation to human decision-makers. We instantiate these definitions in a practical validation workflow, and apply them to assess explanation potential and interpret behavioral effects in human-AI decision support and mechanistic interpretability. Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML) Cite as: arXiv:2506.22740 [cs.AI] (or arXiv...