[2602.22492] From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference
Summary
This paper explores the convergence of shallow Bayesian neural networks to Gaussian processes, focusing on statistical modeling, identifiability, and scalable inference methods.
Why It Matters
Understanding the relationship between Bayesian neural networks and Gaussian processes is crucial for improving statistical modeling techniques in machine learning. This research provides insights into identifiability and offers scalable inference methods, which can enhance predictive performance in real-world applications.
Key Takeaways
- Establishes a general convergence result from shallow Bayesian neural networks to Gaussian processes.
- Introduces a new covariance function based on widely used activation functions.
- Demonstrates stable hyperparameter estimates and competitive predictive performance.
- Develops a scalable MAP training and prediction procedure using Nyström approximation.
- Highlights the cost-accuracy trade-off in model selection.
Statistics > Machine Learning arXiv:2602.22492 (stat) [Submitted on 26 Feb 2026] Title:From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference Authors:Gracielle Antunes de Araújo, Flávio B. Gonçalves View a PDF of the paper titled From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference, by Gracielle Antunes de Ara\'ujo and Fl\'avio B. Gon\c{c}alves View PDF HTML (experimental) Abstract:In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabu...