[2503.19605] Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral
Summary
This article presents a formalization of generalization error bounds using Rademacher complexity and Dudley's entropy integral, implemented in Lean 4. It offers a rigorous approach to understanding machine learning performance guarantees.
Why It Matters
The study enhances the foundational understanding of generalization in machine learning, providing a rigorous mathematical framework that can improve the reliability of machine learning models. This is crucial for researchers and practitioners aiming to develop robust algorithms with theoretical backing.
Key Takeaways
- Formalizes generalization error bounds using Rademacher complexity.
- Utilizes Lean 4 for a mechanically-checked mathematical framework.
- Introduces a reusable mechanism for extending results to broader hypothesis classes.
- Applies findings to standard empirical Rademacher bounds for linear predictors.
- Formalizes Dudley-type entropy integral bounds based on covering numbers.
Computer Science > Machine Learning arXiv:2503.19605 (cs) [Submitted on 25 Mar 2025 (v1), last revised 20 Feb 2026 (this version, v4)] Title:Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral Authors:Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda View a PDF of the paper titled Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral, by Sho Sonoda and 4 other authors View PDF HTML (experimental) Abstract:Understanding and certifying the generalization performance of machine learning algorithms -- i.e. obtaining theoretical estimates of the test error from a finite training sample -- is a central theme of statistical learning theory. Among the many complexity measures used to derive such guarantees, Rademacher complexity yields sharp, data-dependent bounds that apply well beyond classical $0$--$1$ classification. In this study, we formalize the generalization error bound by Rademacher complexity in Lean 4, building on measure-theoretic probability theory available in the Mathlib library. Our development provides a mechanically-checked pipeline from the definitions of empirical and expected Rademacher complexity, through a formal symmetrization argument and a bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from coun...