[2604.00419] G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs
About this article
Abstract page for arXiv paper 2604.00419: G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs
Computer Science > Machine Learning arXiv:2604.00419 (cs) [Submitted on 1 Apr 2026] Title:G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs Authors:Ravi Ranjan, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou View a PDF of the paper titled G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs, by Ravi Ranjan and 3 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are trained on massive web-scale corpora, raising growing concerns about privacy and copyright. Membership inference attacks (MIAs) aim to determine whether a given example was used during training. Existing LLM MIAs largely rely on output probabilities or loss values and often perform only marginally better than random guessing when members and non-members are drawn from the same distribution. We introduce G-Drift MIA, a white-box membership inference method based on gradient-induced feature drift. Given a candidate (x,y), we apply a single targeted gradient-ascent step that increases its loss and measure the resulting changes in internal representations, including logits, hidden-layer activations, and projections onto fixed feature directions, before and after the update. These drift signals are used to train a lightweight logistic classifier that effectively separates members from non-members. Across multiple transformer-based LLMs and datasets derived from realistic MIA benchmarks, G-Drift substantially outperforms confidence-based,...