[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
Summary
The paper presents X-MAP, a framework for analyzing and profiling misclassifications in spam and phishing detection, enhancing interpretability and detection accuracy.
Why It Matters
As spam and phishing attacks continue to evolve, effective detection mechanisms are crucial for user safety. X-MAP addresses the challenge of misclassifications, which can lead to severe consequences, by providing a transparent analysis of model failures and improving detection reliability.
Key Takeaways
- X-MAP combines SHAP-based feature attributions with non-negative matrix factorization for topic profiling.
- The framework significantly reduces false rejection rates in spam/phishing detection.
- X-MAP demonstrates high effectiveness with an AUROC of up to 0.98.
- Misclassified messages show greater divergence from topic profiles, aiding in identifying errors.
- The framework can serve as a repair layer for existing detection systems, recovering many falsely rejected messages.
Computer Science > Artificial Intelligence arXiv:2602.15298 (cs) [Submitted on 17 Feb 2026] Title:X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection Authors:Qi Zhang, Dian Chen, Lance M. Kaplan, Audun Jøsang, Dong Hyun Jeong, Feng Chen, Jin-Hee Cho View a PDF of the paper titled X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection, by Qi Zhang and 6 other authors View PDF HTML (experimental) Abstract:Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair l...