[2603.01053] Turning Black Box into White Box: Dataset Distillation Leaks
About this article
Abstract page for arXiv paper 2603.01053: Turning Black Box into White Box: Dataset Distillation Leaks
Computer Science > Cryptography and Security arXiv:2603.01053 (cs) [Submitted on 1 Mar 2026] Title:Turning Black Box into White Box: Dataset Distillation Leaks Authors:Huajie Chen, Tianqing Zhu, Yuchen Zhong, Yang Zhang, Shang Wang, Feng He, Lefeng Zhang, Jialiang Shen, Minghao Wang, Wanlei Zhou View a PDF of the paper titled Turning Black Box into White Box: Dataset Distillation Leaks, by Huajie Chen and 8 other authors View PDF HTML (experimental) Abstract:Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset. Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2603.01053 [cs.CR] (or arXiv:2603.01053v1 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2603...