[2604.04199] Which Leakage Types Matter?

arXiv - Machine Learning April 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.04199: Which Leakage Types Matter?

Computer Science > Machine Learning arXiv:2604.04199 (cs) [Submitted on 5 Apr 2026] Title:Which Leakage Types Matter? Authors:Simon Roth View a PDF of the paper titled Which Leakage Types Matter?, by Simon Roth View PDF HTML (experimental) Abstract:Twenty-eight within-subject counterfactual experiments across 2,047 tabular datasets, plus a boundary experiment on 129 temporal datasets, measuring the severity of four data leakage classes in machine learning. Class I (estimation - fitting scalers on full data) is negligible: all nine conditions produce $|\Delta\text{AUC}| \leq 0.005$. Class II (selection - peeking, seed cherry-picking) is substantial: ~90% of the measured effect is noise exploitation that inflates reported scores. Class III (memorization) scales with model capacity: d_z = 0.37 (Naive Bayes) to 1.11 (Decision Tree). Class IV (boundary) is invisible under random CV. The textbook emphasis is inverted: normalization leakage matters least; selection leakage at practical dataset sizes matters most. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2604.04199 [cs.LG] (or arXiv:2604.04199v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2604.04199 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Simon Roth [view email] [v1] Sun, 5 Apr 2026 17:47:46 UTC (235 KB) Full-text links: Access Paper: View a PDF of the paper titled Which Leakage Types Matter?, by Simon RothView PDFHTML (experimental)TeX S...

Originally published on April 07, 2026. Curated by AI News.

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · 38 minutes ago

Machine Learning

New technique makes AI models leaner and faster while they’re still learning

AI News - General · 9 min · 38 minutes ago

Machine Learning

Fixing Unsupervised Hyperbolic Contrastive Loss [D]

Hello all, I am trying to implement Unsupervised Hyperbolic Contrastive Loss on the ImageNet-1k dataset. My results show that simple Eucl...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[2603.18066] A Synthesizable RTL Implementation of Predictive Coding Networks

Abstract page for arXiv paper 2603.18066: A Synthesizable RTL Implementation of Predictive Coding Networks

arXiv - Machine Learning · 4 min · about 4 hours ago

[2604.04199] Which Leakage Types Matter?

About this article

Related Articles

Improving AI models’ ability to explain their predictions

New technique makes AI models leaner and faster while they’re still learning

Fixing Unsupervised Hyperbolic Contrastive Loss [D]

[2603.18066] A Synthesizable RTL Implementation of Predictive Coding Networks

No comments

Stay updated with AI News