[2604.04199] Which Leakage Types Matter?

[2604.04199] Which Leakage Types Matter?

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.04199: Which Leakage Types Matter?

Computer Science > Machine Learning arXiv:2604.04199 (cs) [Submitted on 5 Apr 2026] Title:Which Leakage Types Matter? Authors:Simon Roth View a PDF of the paper titled Which Leakage Types Matter?, by Simon Roth View PDF HTML (experimental) Abstract:Twenty-eight within-subject counterfactual experiments across 2,047 tabular datasets, plus a boundary experiment on 129 temporal datasets, measuring the severity of four data leakage classes in machine learning. Class I (estimation - fitting scalers on full data) is negligible: all nine conditions produce $|\Delta\text{AUC}| \leq 0.005$. Class II (selection - peeking, seed cherry-picking) is substantial: ~90% of the measured effect is noise exploitation that inflates reported scores. Class III (memorization) scales with model capacity: d_z = 0.37 (Naive Bayes) to 1.11 (Decision Tree). Class IV (boundary) is invisible under random CV. The textbook emphasis is inverted: normalization leakage matters least; selection leakage at practical dataset sizes matters most. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2604.04199 [cs.LG]   (or arXiv:2604.04199v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2604.04199 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Simon Roth [view email] [v1] Sun, 5 Apr 2026 17:47:46 UTC (235 KB) Full-text links: Access Paper: View a PDF of the paper titled Which Leakage Types Matter?, by Simon RothView PDFHTML (experimental)TeX S...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
New technique makes AI models leaner and faster while they’re still learning
Machine Learning

New technique makes AI models leaner and faster while they’re still learning

AI News - General · 9 min ·
Machine Learning

Fixing Unsupervised Hyperbolic Contrastive Loss [D]

Hello all, I am trying to implement Unsupervised Hyperbolic Contrastive Loss on the ImageNet-1k dataset. My results show that simple Eucl...

Reddit - Machine Learning · 1 min ·
[2603.18066] A Synthesizable RTL Implementation of Predictive Coding Networks
Machine Learning

[2603.18066] A Synthesizable RTL Implementation of Predictive Coding Networks

Abstract page for arXiv paper 2603.18066: A Synthesizable RTL Implementation of Predictive Coding Networks

arXiv - Machine Learning · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime