[2603.25839] A Compression Perspective on Simplicity Bias
About this article
Abstract page for arXiv paper 2603.25839: A Compression Perspective on Simplicity Bias
Computer Science > Machine Learning arXiv:2603.25839 (cs) [Submitted on 26 Mar 2026] Title:A Compression Perspective on Simplicity Bias Authors:Tom Marty, Eric Elmoznino, Leo Gagnon, Tejas Kasetty, Mizu Nishikawa-Toomey, Sarthak Mittal, Guillaume Lajoie, Dhanya Sridhar View a PDF of the paper titled A Compression Perspective on Simplicity Bias, by Tom Marty and 7 other authors View PDF HTML (experimental) Abstract:Deep neural networks exhibit a simplicity bias, a well-documented tendency to favor simple functions over complex ones. In this work, we cast new light on this phenomenon through the lens of the Minimum Description Length principle, formalizing supervised learning as a problem of optimal two-part lossless compression. Our theory explains how simplicity bias governs feature selection in neural networks through a fundamental trade-off between model complexity (the cost of describing the hypothesis) and predictive power (the cost of describing the data). Our framework predicts that as the amount of available training data increases, learners transition through qualitatively different features -- from simple spurious shortcuts to complex features -- only when the reduction in data encoding cost justifies the increased model complexity. Consequently, we identify distinct data regimes where increasing data promotes robustness by ruling out trivial shortcuts, and conversely, regimes where limiting data can act as a form of complexity-based regularization, preventing the...