[2509.10167] The Hidden Width of Deep ResNets: Tight Error Bounds and

[2509.10167] The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram

arXiv - Machine Learning March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.10167: The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram

Computer Science > Machine Learning arXiv:2509.10167 (cs) [Submitted on 12 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram Authors:Lénaïc Chizat View a PDF of the paper titled The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram, by L\'ena\"ic Chizat View PDF HTML (experimental) Abstract:We study the gradient-based training of large-depth residual networks (ResNets) from standard random initializations. We show that infinite-depth ResNets behave as if they were infinitely wide, regardless of their actual width. More precisely, we obtain that with a fixed embedding dimension $D$, the training dynamics converges to a unique Neural Mean ODE training dynamics as the depth $L$ diverges, regardless of the scaling of the hidden width $M$. For a residual scale $\Theta_D\big(\frac{\alpha}{LM}\big)$ with $\alpha=\Theta_D(1)$, we obtain the error bound $O_D\big(\frac{1}{L}+ \frac{1}{\sqrt{LM}}\big)$ between the model's output and its limit after a fixed number gradient of steps. In this regime, the limit exhibits maximal local feature updates, i.e. the Mean ODE is genuinely non-linearly parameterized. In contrast, we show that $\alpha \to \infty$ yields a lazy ODE regime where the Mean ODE is linearly parameterized, and we derive a convergence rate in this case as well. We then focus on the particular case of ResNets with two-layer perceptron blocks, for which we study how these ...

Originally published on March 04, 2026. Curated by AI News.

Llms

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

TL;DR: Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, ...

Reddit - Machine Learning · 1 min · 21 minutes ago

Llms

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

Been working on a weight divergence trajectory curvature approach to detecting neural network training instability. Treats weight updates...

Reddit - Artificial Intelligence · 1 min · 21 minutes ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 25 minutes ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · 25 minutes ago

[2509.10167] The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagram

About this article

Related Articles

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

Built a training stability monitor that detects instability before your loss curve shows anything — open sourced the core today

UMKC Announces New Master of Science in Artificial Intelligence

Improving AI models’ ability to explain their predictions

No comments

Stay updated with AI News