[2410.12439] Beyond Attribution: Unified Concept-Level Explanations
Summary
The paper presents UnCLE, a framework that enhances model-agnostic explanation techniques by integrating concept-based approaches, offering richer and more faithful explanations across various model types.
Why It Matters
As machine learning models become increasingly complex, understanding their decision-making processes is crucial. This research addresses the limitations of current explanation methods, providing a unified approach that enhances interpretability and usability for end-users, which is vital for trust and transparency in AI applications.
Key Takeaways
- UnCLE bridges the gap between model-agnostic and concept-based explanation techniques.
- It offers three types of explanations: attributions, sufficient conditions, and counterfactuals.
- The framework enhances the fidelity and richness of explanations, catering to diverse user needs.
- Evaluation results show UnCLE outperforms existing state-of-the-art methods.
- This work is significant for improving AI transparency and user trust.
Computer Science > Machine Learning arXiv:2410.12439 (cs) [Submitted on 16 Oct 2024 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Beyond Attribution: Unified Concept-Level Explanations Authors:Junhao Liu, Haonan Yu, Xin Zhang View a PDF of the paper titled Beyond Attribution: Unified Concept-Level Explanations, by Junhao Liu and 2 other authors View PDF HTML (experimental) Abstract:There is an increasing need to integrate model-agnostic explanation techniques with concept-based approaches, as the former can explain models across different architectures while the latter makes explanations more faithful and understandable to end-users. However, existing concept-based model-agnostic explanation methods are limited in scope, mainly focusing on attribution-based explanations while neglecting diverse forms like sufficient conditions and counterfactuals, thus narrowing their utility. To bridge this gap, we propose a general framework UnCLE to elevate existing local model-agnostic techniques to provide concept-based explanations. Our key insight is that we can uniformly extend existing local model-agnostic methods to provide unified concept-based explanations with large pre-trained model perturbation. We have instantiated UnCLE to provide concept-based explanations in three forms: attributions, sufficient conditions, and counterfactuals, and applied it to popular text, image, and multimodal models. Our evaluation results demonstrate that UnCLE provides explanations more...