[2603.00811] Curation Leaks: Membership Inference Attacks against Data

[2603.00811] Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.00811: Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning

Computer Science > Machine Learning arXiv:2603.00811 (cs) [Submitted on 28 Feb 2026] Title:Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning Authors:Dariush Wahdany (1), Matthew Jagielski (2), Adam Dziedzic (1), Franziska Boenisch (1) ((1) CISPA Helmholtz Center for Information Security, (2) Anthropic) View a PDF of the paper titled Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning, by Dariush Wahdany (1) and 3 other authors View PDF HTML (experimental) Abstract:In machine learning, curation is used to select the most valuable data for improving both model accuracy and computational efficiency. Recently, curation has also been explored as a solution for private machine learning: rather than training directly on sensitive data, which is known to leak information through model predictions, the private data is used only to guide the selection of useful public data. The resulting model is then trained solely on curated public data. It is tempting to assume that such a model is privacy-preserving because it has never seen the private data. Yet, we show that without further protection, curation pipelines can still leak private information. Specifically, we introduce novel attacks against popular curation methods, targeting every major step: the computation of curation scores, the selection of the curated subset, and the final trained model. We demonstrate that each stage reveals information about the pri...

Originally published on March 03, 2026. Curated by AI News.

Llms

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

Advice from the study's co-author: "Be aware that it’s not any single post that identifies you, but the combination of small details acro...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Machine Learning

[D] Best websites for pytorch/numpy interviews

Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

Can AI truly be creative?

AI has no imagination. “Creativity is the ability to generate novel and valuable ideas or works through the exercise of imagination” http...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

[2603.00811] Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning

About this article

Related Articles

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

[D] Best websites for pytorch/numpy interviews

[P] Remote sensing foundation models made easy to use.

Can AI truly be creative?

No comments

Stay updated with AI News