[2602.16800] Large-scale online deanonymization with LLMs
Summary
This article discusses the use of large language models (LLMs) for deanonymizing online users, demonstrating high precision in identifying pseudonymous profiles across various platforms.
Why It Matters
As online privacy concerns grow, this research highlights the vulnerabilities of pseudonymous identities in the digital space. The findings suggest that existing privacy protections may be inadequate, prompting a reevaluation of threat models and privacy strategies in online environments.
Key Takeaways
- LLMs can effectively deanonymize users with high precision using unstructured text.
- The research demonstrates a significant improvement over traditional deanonymization methods.
- Three datasets were constructed to validate the effectiveness of LLMs in deanonymization.
- The results indicate that pseudonymous online identities are no longer secure.
- Privacy threat models must be reconsidered in light of these findings.
Computer Science > Cryptography and Security arXiv:2602.16800 (cs) [Submitted on 18 Feb 2026] Title:Large-scale online deanonymization with LLMs Authors:Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramèr View a PDF of the paper titled Large-scale online deanonymization with LLMs, by Simon Lermen and 5 other authors View PDF HTML (experimental) Abstract:We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-...