[2507.10846] Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization
Summary
Winsor-CAM introduces a novel method for visual explanations in deep networks, enhancing interpretability through human-tunable parameters and improved localization metrics.
Why It Matters
As deep learning models are increasingly used in critical applications like healthcare, understanding their decision-making processes is vital. Winsor-CAM offers a robust solution for generating visual explanations, making it easier for experts to analyze and trust AI systems.
Key Takeaways
- Winsor-CAM aggregates Grad-CAM maps from all convolutional layers for better interpretability.
- The method allows users to tune explanations based on a percentile parameter, enhancing semantic relevance.
- Evaluation shows Winsor-CAM outperforms existing methods like Grad-CAM in localization and fidelity metrics.
- The approach is particularly effective in medical imaging contexts, improving expert analysis.
- Incorporating earlier layers in CNNs significantly enhances localization performance.
Computer Science > Computer Vision and Pattern Recognition arXiv:2507.10846 (cs) [Submitted on 14 Jul 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization Authors:Casey Wall, Longwei Wang, Rodrigue Rizk, KC Santosh View a PDF of the paper titled Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization, by Casey Wall and 3 other authors View PDF HTML (experimental) Abstract:Interpreting Convolutional Neural Networks (CNNs) is critical for safety-sensitive applications such as healthcare and autonomous systems. Popular visual explanation methods like Grad-CAM use a single convolutional layer, potentially missing multi-scale cues and producing unstable saliency maps. We introduce Winsor-CAM, a single-pass gradient-based method that aggregates Grad-CAM maps from all convolutional layers and applies percentile-based Winsorization to attenuate outlier contributions. A user-controllable percentile parameter p enables semantic-level tuning from low-level textures to high-level object patterns. We evaluate Winsor-CAM on six CNN architectures using PASCAL VOC 2012 and PolypGen, comparing localization (IoU, center-of-mass distance) and fidelity (insertion/deletion AUC) against seven baselines including Grad-CAM, Grad-CAM++, LayerCAM, ScoreCAM, AblationCAM, ShapleyCAM, and FullGrad. On DenseNet121 with a subset of Pascal VOC 2012, Winsor-CAM achie...