[2409.20250] Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance
Summary
This article explores how input-label correlation influences the performance of random feature models (RFMs) in machine learning, particularly under spiked covariance conditions, highlighting a transition from linear to nonlinear behavior.
Why It Matters
Understanding the dynamics of RFMs is crucial for machine learning practitioners as it reveals conditions under which these models can outperform traditional linear methods. This research provides insights into optimizing model performance based on data characteristics, which is vital for developing effective predictive algorithms.
Key Takeaways
- Input-label correlation is key to the performance of RFMs.
- Under certain conditions, RFMs transition from linear to nonlinear predictors.
- Numerical simulations support the theoretical findings of the study.
- The study establishes a boundary in the correlation-spike-magnitude plane.
- Understanding these transitions can enhance model selection in practical applications.
Statistics > Machine Learning arXiv:2409.20250 (stat) [Submitted on 30 Sep 2024 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance Authors:Samet Demir, Zafer Dogan View a PDF of the paper titled Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance, by Samet Demir and 1 other authors View PDF HTML (experimental) Abstract:Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is...