[2604.04195] Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach
About this article
Abstract page for arXiv paper 2604.04195: Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach
Computer Science > Machine Learning arXiv:2604.04195 (cs) [Submitted on 5 Apr 2026] Title:Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach Authors:Gabriel Diaz Ramos, Lorenzo Luzi, Debshila Basu Mallick, Richard Baraniuk View a PDF of the paper titled Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach, by Gabriel Diaz Ramos and 3 other authors View PDF HTML (experimental) Abstract:To advance Educational Data Mining (EDM) within strict privacy-protecting regulatory frameworks, researchers must develop methods that enable data-driven analysis while protecting sensitive student information. Synthetic data generation is one such approach, enabling the release of statistically generated samples instead of real student records; however, existing deep learning and parametric generators often distort marginal distributions and degrade under iterative regeneration, leading to distribution drift and progressive loss of distributional support that compromise reliability. In response, we introduce the Non-Parametric Gaussian Copula (NPGC), a plug-and-play synthesis method that replaces deep learning and parametric optimization with empirical statistical anchoring to preserve the observed marginal distributions while modeling dependencies through a copula framework. NPGC integrates Differential Privacy (DP) at both the marginal and correlation levels, supports heterogeneous...