[2504.14094] Leakage and Interpretability in Concept-Based Models
About this article
Abstract page for arXiv paper 2504.14094: Leakage and Interpretability in Concept-Based Models
Computer Science > Machine Learning arXiv:2504.14094 (cs) [Submitted on 18 Apr 2025 (v1), last revised 24 Mar 2026 (this version, v3)] Title:Leakage and Interpretability in Concept-Based Models Authors:Enrico Parisini, Tapabrata Chakraborti, Chris Harbron, Ben D. MacArthur, Christopher R. S. Banerji View a PDF of the paper titled Leakage and Interpretability in Concept-Based Models, by Enrico Parisini and 4 other authors View PDF HTML (experimental) Abstract:Concept-based Models aim to improve interpretability by predicting high-level intermediate concepts, representing a promising approach for deployment in high-risk scenarios. However, they are known to suffer from information leakage, whereby models exploit unintended information encoded within the learned concepts. We introduce an information-theoretic framework to rigorously characterise and quantify leakage, and define two complementary measures: the concepts-task leakage (CTL) and interconcept leakage (ICL) scores. We show that these measures are strongly predictive of model behaviour under interventions and outperform existing alternatives. Using this framework, we identify the primary causes of leakage and, as a case study, analyse how it manifests in Concept Embedding Models, revealing interconcept and alignment leakage in addition to the concepts-task leakage present by design. Finally, we present a set of practical guidelines for designing concept-based models to reduce leakage and ensure interpretability. Comm...