[2508.11522] Finite-Width Neural Tangent Kernels from Feynman Diagrams
Summary
This article presents a novel approach to computing finite-width neural tangent kernels (NTKs) using Feynman diagrams, enhancing the understanding of training dynamics in deep networks.
Why It Matters
Understanding finite-width effects in neural networks is crucial for improving model training and performance. This research provides a new computational framework that simplifies the analysis of NTKs, potentially leading to better insights into neural network behavior and training dynamics.
Key Takeaways
- Introduces Feynman diagrams for computing finite-width corrections to NTK statistics.
- Simplifies algebraic manipulations necessary for analyzing neural networks.
- Extends stability results for deep networks from preactivations to NTKs.
- Demonstrates absence of finite-width corrections for certain nonlinearities.
- Numerically validates the framework for neural networks with widths greater than 20.
Computer Science > Machine Learning arXiv:2508.11522 (cs) [Submitted on 15 Aug 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:Finite-Width Neural Tangent Kernels from Feynman Diagrams Authors:Max Guillen, Philipp Misof, Jan E. Gerken View a PDF of the paper titled Finite-Width Neural Tangent Kernels from Feynman Diagrams, by Max Guillen and 2 other authors View PDF Abstract:Neural tangent kernels (NTKs) are a powerful tool for analyzing deep, non-linear neural networks. In the infinite-width limit, NTKs can easily be computed for most common architectures, yielding full analytic control over the training dynamics. However, at infinite width, important properties of training such as NTK evolution or feature learning are absent. Nevertheless, finite width effects can be included by computing corrections to the Gaussian statistics at infinite width. We introduce Feynman diagrams for computing finite-width corrections to NTK statistics. These dramatically simplify the necessary algebraic manipulations and enable the computation of layer-wise recursion relations for arbitrary statistics involving preactivations, NTKs and certain higher-derivative tensors (dNTK and ddNTK) required to predict the training dynamics at leading order. We demonstrate the feasibility of our framework by extending stability results for deep networks from preactivations to NTKs and proving the absence of finite-width corrections for scale-invariant nonlinearities such as ReLU on the diago...