[2604.00394] Deep Networks Favor Simple Data
About this article
Abstract page for arXiv paper 2604.00394: Deep Networks Favor Simple Data
Computer Science > Machine Learning arXiv:2604.00394 (cs) [Submitted on 1 Apr 2026] Title:Deep Networks Favor Simple Data Authors:Weyl Lu, Chenjie Hao, Yubei Chen View a PDF of the paper titled Deep Networks Favor Simple Data, by Weyl Lu and 2 other authors View PDF HTML (experimental) Abstract:Estimated density is often interpreted as indicating how typical a sample is under a model. Yet deep models trained on one dataset can assign \emph{higher} density to simpler out-of-distribution (OOD) data than to in-distribution test data. We refer to this behavior as the OOD anomaly. Prior work typically studies this phenomenon within a single architecture, detector, or benchmark, implicitly assuming certain canonical densities. We instead separate the trained network from the density estimator built from its representations or outputs. We introduce two estimators: Jacobian-based estimators and autoregressive self-estimators, making density analysis applicable to a wide range of models. Applying this perspective to a range of models, including iGPT, PixelCNN++, Glow, score-based diffusion models, DINOv2, and I-JEPA, we find the same striking regularity that goes beyond the OOD anomaly: \textbf{lower-complexity samples receive higher estimated density, while higher-complexity samples receive lower estimated density}. This ordering appears within a test set and across OOD pairs such as CIFAR-10 and SVHN, and remains highly consistent across independently trained models. To quantify ...