[2603.23146] Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
About this article
Abstract page for arXiv paper 2603.23146: Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
Computer Science > Computation and Language arXiv:2603.23146 (cs) [Submitted on 24 Mar 2026] Title:Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy Authors:Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, Marisa Llorens Salvador View a PDF of the paper titled Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy, by Shushanta Pudasaini and 3 other authors View PDF HTML (experimental) Abstract:The widespread adoption of Large Language Models (LLMs) has made the detection of AI-Generated text a pressing and complex challenge. Although many detection systems report high benchmark accuracy, their reliability in real-world settings remains uncertain, and their interpretability is often unexplored. In this work, we investigate whether contemporary detectors genuinely identify machine authorship or merely exploit dataset-specific artefacts. We propose an interpretable detection framework that integrates linguistic feature engineering, machine learning, and explainable AI techniques. When evaluated on two prominent benchmark corpora, namely PAN CLEF 2025 and COLING 2025, our model trained on 30 linguistic features achieves leaderboard-competitive performance, attaining an F1 score of 0.9734. However, systematic cross-domain and cross-generator evaluation reveals substantial generalisation failure: classifiers that excel in-domain degrade significantly under distribution shift. Using SHAP-...