[2604.07267] The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours
About this article
Abstract page for arXiv paper 2604.07267: The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours
Statistics > Machine Learning arXiv:2604.07267 (stat) [Submitted on 8 Apr 2026] Title:The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours Authors:Robert Allison, Tomasz Maciazek, Anthony Stephenson View a PDF of the paper titled The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours, by Robert Allison and 2 other authors View PDF HTML (experimental) Abstract:Gaussian process ($GP$) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process ($NNGP$) regression for geospatial problems and the related scalable $GPnn$ method for more general machine-learning applications. Despite their strong empirical performance, the large-$n$ theory of $NNGP/GPnn$ remains incomplete. We develop a theoretical framework for $NNGP$ and $GPnn$ regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error ($MSE$), calibration coefficient ($CAL$), and negative log-likelihood ($NLL$). We then study the $L_2$-risk, prove universal consistency, and show that the risk attains Stone's minimax rate $n^{-2\alpha/(2p+d)}$, where $\alpha$ and $p$ capture regularity of the regression problem. We also prove uniform convergence of $MSE$ over co...