[2601.04378] Aligned explanations in neural networks
About this article
Abstract page for arXiv paper 2601.04378: Aligned explanations in neural networks
Computer Science > Machine Learning arXiv:2601.04378 (cs) [Submitted on 7 Jan 2026 (v1), last revised 28 Feb 2026 (this version, v2)] Title:Aligned explanations in neural networks Authors:Corentin Lobet, Francesca Chiaromonte View a PDF of the paper titled Aligned explanations in neural networks, by Corentin Lobet and 1 other authors View PDF HTML (experimental) Abstract:Feature attribution is the dominant paradigm for explaining the predictions of complex machine learning models like neural networks. However, most existing methods offer little guarantee of reflecting the model's prediction-making process. We define the notion of explanatory alignment and argue that it is central to trustworthy predictive modeling: in short, it requires that explanations directly underlie predictions rather than serve as rationalizations. We present model readability as a design principle enabling alignment, and Pointwise-interpretable Networks (PiNets) as a modeling framework to pursue it in a deep learning context. PiNets combine statistical intelligence with a pseudo-linear structure that yields instance-wise linear predictions in an arbitrary feature space. We illustrate their use on image classification and segmentation tasks, demonstrating that PiNets produce explanations that are not only aligned by design but also faithful across other dimensions: meaningfulness, robustness, and sufficiency. Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machin...