[2509.10167] The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram
About this article
Abstract page for arXiv paper 2509.10167: The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram
Computer Science > Machine Learning arXiv:2509.10167 (cs) [Submitted on 12 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram Authors:Lénaïc Chizat View a PDF of the paper titled The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram, by L\'ena\"ic Chizat View PDF HTML (experimental) Abstract:We study the gradient-based training of large-depth residual networks (ResNets) from standard random initializations. We show that infinite-depth ResNets behave as if they were infinitely wide, regardless of their actual width. More precisely, we obtain that with a fixed embedding dimension $D$, the training dynamics converges to a unique Neural Mean ODE training dynamics as the depth $L$ diverges, regardless of the scaling of the hidden width $M$. For a residual scale $\Theta_D\big(\frac{\alpha}{LM}\big)$ with $\alpha=\Theta_D(1)$, we obtain the error bound $O_D\big(\frac{1}{L}+ \frac{1}{\sqrt{LM}}\big)$ between the model's output and its limit after a fixed number gradient of steps. In this regime, the limit exhibits maximal local feature updates, i.e. the Mean ODE is genuinely non-linearly parameterized. In contrast, we show that $\alpha \to \infty$ yields a lazy ODE regime where the Mean ODE is linearly parameterized, and we derive a convergence rate in this case as well. We then focus on the particular case of ResNets with two-layer perceptron blocks, for which we study how these ...