[2504.10833] Measuring the (Un)Faithfulness of Concept-Based Explanations
About this article
Abstract page for arXiv paper 2504.10833: Measuring the (Un)Faithfulness of Concept-Based Explanations
Computer Science > Machine Learning arXiv:2504.10833 (cs) [Submitted on 15 Apr 2025 (v1), last revised 27 Mar 2026 (this version, v4)] Title:Measuring the (Un)Faithfulness of Concept-Based Explanations Authors:Shubham Kumar, Narendra Ahuja View a PDF of the paper titled Measuring the (Un)Faithfulness of Concept-Based Explanations, by Shubham Kumar and 1 other authors View PDF HTML (experimental) Abstract:Deep vision models perform input-output computations that are hard to interpret. Concept-based explanation methods (CBEMs) increase interpretability by re-expressing parts of the model with human-understandable semantic units, or concepts. Checking if the derived explanations are faithful -- that is, they represent the model's internal computation -- requires a surrogate that combines concepts to compute the output. Simplifications made for interpretability inevitably reduce faithfulness, resulting in a tradeoff between the two. State-of-the-art unsupervised CBEMs (U-CBEMs) are seemingly more interpretable, while also being more faithful to the model. However, we observe that the reported improvement in faithfulness artificially results from either (1) using overly complex surrogates, which introduces an unmeasured cost to the explanation's interpretability, or (2) relying on deletion-based approaches that, as we demonstrate, do not properly measure faithfulness. We propose Surrogate Faithfulness (SURF), which (1) replaces prior complex surrogates with a simple, linear sur...