[2604.07486] Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation
About this article
Abstract page for arXiv paper 2604.07486: Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation
Computer Science > Cryptography and Security arXiv:2604.07486 (cs) [Submitted on 8 Apr 2026 (v1), last revised 11 Apr 2026 (this version, v2)] Title:Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation Authors:Qian Ma, Sarah Rajtmajer View a PDF of the paper titled Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation, by Qian Ma and 1 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which uses private seeds and integrates privacy-preserving strategies, including a formal differential privacy (DP) mechanism in the candidate selection, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection. Comments: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI) Cite as: arXiv:2604.07486 [cs.CR] (or arXiv:2604.07486v2 [cs.CR] for this version) https://doi.org/10.48550/arXiv.2604.07486 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Qian Ma [view email] [v1] Wed, 8 Apr 2026 18:26:34 ...