[2602.23219] Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime
Summary
This paper investigates Takeuchi's Information Criterion (TIC) as a measure for generalization in deep neural networks (DNNs) near the neural tangent kernel (NTK) regime, providing empirical evidence and theoretical insights into its effectiveness.
Why It Matters
Understanding generalization in DNNs is crucial for improving model performance and reliability. This study highlights TIC's potential as a robust measure in specific conditions, which can aid researchers and practitioners in optimizing DNNs and addressing generalization gaps.
Key Takeaways
- TIC effectively explains generalization gaps in DNNs close to the NTK regime.
- The study involved training over 5,000 DNN models across various architectures and datasets.
- TIC shows better trial pruning ability compared to existing hyperparameter optimization methods.
- Correlation between TIC values and generalization gaps diminishes outside the NTK regime.
- The research offers practical TIC approximation methods with manageable computational costs.
Computer Science > Machine Learning arXiv:2602.23219 (cs) [Submitted on 26 Feb 2026] Title:Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime Authors:Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato View a PDF of the paper titled Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime, by Hiroki Naganuma and 5 other authors View PDF HTML (experimental) Abstract:Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC approximation methods with feasible computational costs and assessed the accuracy trade-off. Our experimental results indicate that the...