[2603.22644] Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification
About this article
Abstract page for arXiv paper 2603.22644: Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification
Statistics > Machine Learning arXiv:2603.22644 (stat) [Submitted on 23 Mar 2026] Title:Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification Authors:Xiaohan Zhu, Mesrob I. Ohannessian, Nathan Srebro View a PDF of the paper titled Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification, by Xiaohan Zhu and 2 other authors View PDF HTML (experimental) Abstract:We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $\lambda=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $\lambda \gg 1$, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter $\lambda$, understa...