[2505.11312] Where You Place the Norm Matters: From Prejudiced to Neutral Initializations
About this article
Abstract page for arXiv paper 2505.11312: Where You Place the Norm Matters: From Prejudiced to Neutral Initializations
Computer Science > Machine Learning arXiv:2505.11312 (cs) [Submitted on 16 May 2025 (v1), last revised 2 Apr 2026 (this version, v4)] Title:Where You Place the Norm Matters: From Prejudiced to Neutral Initializations Authors:Emanuele Francazi, Francesco Pinto, Aurelien Lucchi, Marco Baity-Jesi View a PDF of the paper titled Where You Place the Norm Matters: From Prejudiced to Neutral Initializations, by Emanuele Francazi and 3 other authors View PDF HTML (experimental) Abstract:Normalization layers were introduced to stabilize and accelerate training, yet their influence is critical already at initialization, where they shape signal propagation and output statistics before parameters adapt to data. In practice, both which normalization to use and where to place it are often chosen heuristically, despite the fact that these decisions can qualitatively alter a model's behavior. We provide a theoretical characterization of how normalization choice and placement (Pre-Norm vs. Post-Norm) determine the distribution of class predictions at initialization, ranging from unbiased (Neutral) to highly concentrated (Prejudiced) regimes. We show that these architectural decisions induce systematic shifts in the initial prediction regime, thereby modulating subsequent learning dynamics. By linking normalization design directly to prediction statistics at initialization, our results offer principled guidance for more controlled and interpretable network design, including clarifying how wi...