[2602.22882] Fair feature attribution for multi-output prediction: a Shapley-based perspective
Summary
This article presents a Shapley-based framework for fair feature attribution in multi-output prediction, addressing the limitations of existing methods and providing a theoretical foundation for joint-output attribution.
Why It Matters
Understanding fair feature attribution is crucial for developing interpretable machine learning models, especially in multi-output scenarios where fairness and accuracy must be balanced. This research clarifies the constraints of Shapley-based methods, enhancing the reliability of explanations in AI systems.
Key Takeaways
- Introduces a Shapley-based framework for fair feature attribution in multi-output models.
- Establishes that traditional SHAP explanations may not be adequate for joint outputs.
- Identifies structural constraints in Shapley-based interpretability.
- Demonstrates computational savings in multi-output models through numerical experiments.
- Clarifies the implications of fairness-consistent explanations in machine learning.
Computer Science > Machine Learning arXiv:2602.22882 (cs) [Submitted on 26 Feb 2026] Title:Fair feature attribution for multi-output prediction: a Shapley-based perspective Authors:Umberto Biccari, Alain Ibáñez de Opakua, José María Mato, Óscar Millet, Roberto Morales, Enrique Zuazua View a PDF of the paper titled Fair feature attribution for multi-output prediction: a Shapley-based perspective, by Umberto Biccari and 5 other authors View PDF HTML (experimental) Abstract:In this article, we provide an axiomatic characterization of feature attribution for multi-output predictors within the Shapley framework. While SHAP explanations are routinely computed independently for each output coordinate, the theoretical necessity of this practice has remained unclear. By extending the classical Shapley axioms to vector-valued cooperative games, we establish a rigidity theorem showing that any attribution rule satisfying efficiency, symmetry, dummy player, and additivity must necessarily decompose component-wise across outputs. Consequently, any joint-output attribution rule must relax at least one of the classical Shapley axioms. This result identifies a previously unformalized structural constraint in Shapley-based interpretability, clarifying the precise scope of fairness-consistent explanations in multi-output learning. Numerical experiments on a biomedical benchmark illustrate that multi-output models can yield computational savings in training and deployment, while producing SH...