[2602.20921] On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective
Summary
This paper explores the generalization behavior of deep residual networks (ResNets) through a dynamical systems framework, establishing new error bounds that unify discrete and continuous-time models.
Why It Matters
Understanding the generalization capabilities of deep learning models is crucial for their effective application in real-world scenarios. This research provides valuable insights into how ResNets can be optimized, potentially improving their performance in various machine learning tasks.
Key Takeaways
- Establishes generalization error bounds for ResNets using dynamical systems.
- Combines Rademacher complexity with flow maps to derive new insights.
- Offers a unified understanding of generalization across discrete and continuous settings.
- Findings can help improve sample complexity assumptions in deep learning.
- Contributes to the theoretical foundation of deep learning model performance.
Computer Science > Machine Learning arXiv:2602.20921 (cs) [Submitted on 24 Feb 2026] Title:On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective Authors:Jinshu Huang, Mingfei Sun, Chunlin Wu View a PDF of the paper titled On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective, by Jinshu Huang and Mingfei Sun and Chunlin Wu View PDF HTML (experimental) Abstract:Deep neural networks (DNNs) have significantly advanced machine learning, with model depth playing a central role in their successes. The dynamical system modeling approach has recently emerged as a powerful framework, offering new mathematical insights into the structure and learning behavior of DNNs. In this work, we establish generalization error bounds for both discrete- and continuous-time residual networks (ResNets) by combining Rademacher complexity, flow maps of dynamical systems, and the convergence behavior of ResNets in the deep-layer limit. The resulting bounds are of order $O(1/\sqrt{S})$ with respect to the number of training samples $S$, and include a structure-dependent negative term, yielding depth-uniform and asymptotic generalization bounds under milder assumptions. These findings provide a unified understanding of generalization across both discrete- and continuous-time ResNets, helping to close the gap in both the order of sample complexity and assumptions between the discrete- and continuous-time settings. Subjects: Mac...