[2602.17530] Provably Explaining Neural Additive Models

arXiv - Machine Learning February 20, 2026 4 min read Article

Summary

The paper presents a novel algorithm for generating provably minimal explanations for Neural Additive Models (NAMs), improving efficiency and interpretability in machine learning.

Why It Matters

As machine learning models become more complex, the need for interpretable AI grows. This research addresses the challenge of providing explanations with provable guarantees, enhancing trust and understanding in AI systems, particularly in critical applications where transparency is essential.

Key Takeaways

Introduces a new algorithm for generating provably minimal explanations for Neural Additive Models.
Achieves efficient explanations using logarithmic verification queries, significantly reducing computation time.
Outperforms existing algorithms in both explanation size and computational efficiency.
Addresses the limitations of heuristic post-hoc explanation methods in neural networks.
Enhances interpretability of NAMs, making them more applicable in real-world scenarios.

Computer Science > Machine Learning arXiv:2602.17530 (cs) [Submitted on 19 Feb 2026] Title:Provably Explaining Neural Additive Models Authors:Shahaf Bassan, Yizhak Yisrael Elboher, Tobias Ladner, Volkan Şahin, Jan Kretinsky, Matthias Althoff, Guy Katz View a PDF of the paper titled Provably Explaining Neural Additive Models, by Shahaf Bassan and 6 other authors View PDF Abstract:Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a cardinally-minimal subset of input features which by itself is provably sufficient to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case exponential number of verification queries in the number of input features, each of which is NP-hard. In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can efficiently generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component. Our algorithm not only makes the task of obtainin...

Read Original Article