[2601.12104] Powerful Training-Free Membership Inference Against Autoregressive Language Models
About this article
Abstract page for arXiv paper 2601.12104: Powerful Training-Free Membership Inference Against Autoregressive Language Models
Computer Science > Computation and Language arXiv:2601.12104 (cs) [Submitted on 17 Jan 2026 (v1), last revised 13 Apr 2026 (this version, v2)] Title:Powerful Training-Free Membership Inference Against Autoregressive Language Models Authors:David Ilić, David Stanojević, Kostadin Cvejoski View a PDF of the paper titled Powerful Training-Free Membership Inference Against Autoregressive Language Models, by David Ili\'c and 2 other authors View PDF HTML (experimental) Abstract:Fine-tuned language models pose significant privacy risks, as they may memorize and expose sensitive information from their training data. Membership inference attacks (MIAs) provide a principled framework for auditing these risks, yet existing methods achieve limited detection rates, particularly at the low false-positive thresholds required for practical privacy auditing. We present EZ-MIA, a membership inference attack that exploits a key observation: memorization manifests most strongly at error positions, specifically tokens where the model predicts incorrectly yet still shows elevated probability for training examples. We introduce the Error Zone (EZ) score, which measures the directional imbalance of probability shifts at error positions relative to a pretrained reference model. This principled statistic requires only two forward passes per query and no model training of any kind. On WikiText with GPT-2, EZ-MIA achieves 3.8x higher detection than the previous state-of-the-art under identical condit...