[2512.04165] Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
About this article
Abstract page for arXiv paper 2512.04165: Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Computer Science > Machine Learning arXiv:2512.04165 (cs) [Submitted on 3 Dec 2025 (v1), last revised 23 Mar 2026 (this version, v4)] Title:Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity Authors:Noa Rubin, Orit Davidovich, Zohar Ringel View a PDF of the paper titled Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity, by Noa Rubin and 2 other authors View PDF HTML (experimental) Abstract:Two pressing topics in the theory of deep learning are the interpretation of feature learning (FL) mechanisms and the determination of implicit bias of networks in the rich regime. Current theories of rich FL often appear in the form of high-dimensional non-linear equations, which require computationally intensive numerical solutions. Given the many details that go into defining a deep learning problem, this analytical complexity is a significant and often unavoidable challenge. Here, we propose a powerful heuristic route for predicting the data and width scales at which various patterns of FL emerge. This form of scale analysis is considerably simpler than such exact theories and reproduces the scaling exponents of various known results. In addition, we make novel predictions on complex toy architectures, such as three-layer non-linear networks and attention heads, thus extending the scope of first-principle theories of deep learning. Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML) Cite a...