[2602.09238] Feature salience -- not task-informativeness -- drives machine learning model explanations
Summary
This paper investigates the factors influencing feature importance in machine learning model explanations, emphasizing that feature salience, rather than task-informativeness, drives these attributions.
Why It Matters
Understanding the drivers of feature importance in machine learning is crucial for the development of explainable AI (XAI). This research challenges existing assumptions about how models attribute importance, suggesting that salience may play a more significant role than previously thought. This has implications for the reliability of XAI methods and their application in real-world scenarios.
Key Takeaways
- Feature salience significantly influences importance attribution in machine learning models.
- Task-informativeness may not be the primary driver of feature importance as previously assumed.
- The study highlights the need to reevaluate existing XAI applications and methodologies.
- Results indicate that salience can overshadow statistical associations in model explanations.
- Attribution methods should be scrutinized for their reliance on feature salience.
Computer Science > Machine Learning arXiv:2602.09238 (cs) [Submitted on 9 Feb 2026 (v1), last revised 15 Feb 2026 (this version, v2)] Title:Feature salience -- not task-informativeness -- drives machine learning model explanations Authors:Benedict Clark, Marta Oliveira, Rick Wilming, Stefan Haufe View a PDF of the paper titled Feature salience -- not task-informativeness -- drives machine learning model explanations, by Benedict Clark and 3 other authors View PDF HTML (experimental) Abstract:Explainable AI (XAI) promises to provide insight into machine learning models' decision processes, where one goal is to identify failures such as shortcut learning. This promise relies on the field's assumption that input features marked as important by an XAI must contain information about the target variable. However, it is unclear whether informativeness is indeed the main driver of importance attribution in practice, or if other data properties such as statistical suppression, novelty at test-time, or high feature salience substantially contribute. To clarify this, we trained deep learning models on three variants of a binary image classification task, in which translucent watermarks are either absent, act as class-dependent confounds, or represent class-independent noise. Results for five popular attribution methods show substantially elevated relative importance in watermarked areas (RIW) for all models regardless of the training setting ($R^2 \geq .45$). By contrast, whether the...