[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks
Summary
The paper presents Gaussian Process Activations (GAPA), a novel method for uncertainty quantification in pretrained networks, enhancing efficiency without altering predictions.
Why It Matters
Reliable uncertainty estimates are essential for deploying AI models safely. GAPA offers a solution that avoids the computational costs of traditional methods, making it easier to implement robust uncertainty quantification in various applications, including regression and classification tasks.
Key Takeaways
- GAPA shifts Bayesian modeling from weights to activations for better uncertainty quantification.
- The method preserves original predictions while providing closed-form epistemic variances.
- GAPA is efficient, requiring no sampling or second-order computations, making it suitable for modern architectures.
- It outperforms existing post-hoc methods in calibration and out-of-distribution detection.
- Applicable across various domains including regression, classification, and language modeling.
Statistics > Machine Learning arXiv:2602.14934 (stat) [Submitted on 16 Feb 2026] Title:Activation-Space Uncertainty Quantification for Pretrained Networks Authors:Richard Bergna, Stefan Depeweg, Sergio Calvo-Ordoñez, Jonathan Plenk, Alvaro Cartea, Jose Miguel Hernández-Lobato View a PDF of the paper titled Activation-Space Uncertainty Quantification for Pretrained Networks, by Richard Bergna and 5 other authors View PDF HTML (experimental) Abstract:Reliable uncertainty estimates are crucial for deploying pretrained models; yet, many strong methods for quantifying uncertainty require retraining, Monte Carlo sampling, or expensive second-order computations and may alter a frozen backbone's predictions. To address this, we introduce Gaussian Process Activations (GAPA), a post-hoc method that shifts Bayesian modeling from weights to activations. GAPA replaces standard nonlinearities with Gaussian-process activations whose posterior mean exactly matches the original activation, preserving the backbone's point predictions by construction while providing closed-form epistemic variances in activation space. To scale to modern architectures, we use a sparse variational inducing-point approximation over cached training activations, combined with local k-nearest-neighbor subset conditioning, enabling deterministic single-pass uncertainty propagation without sampling, backpropagation, or second-order information. Across regression, classification, image segmentation, and language mode...