[2602.21160] Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

[2602.21160] Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a novel method for decomposing epistemic uncertainty in machine learning models into per-class contributions, enhancing safety in classification tasks.

Why It Matters

Understanding and managing epistemic uncertainty is crucial in safety-critical applications. This research provides a framework that allows for better decision-making by distinguishing uncertainty across different classes, which can significantly reduce risks in areas like healthcare and autonomous systems.

Key Takeaways

  • Introduces a per-class vector for decomposing mutual information, enhancing uncertainty measurement.
  • Demonstrates improved performance in selective prediction and out-of-distribution detection tasks.
  • Highlights the importance of how uncertainty is propagated through neural networks.

Statistics > Machine Learning arXiv:2602.21160 (stat) [Submitted on 24 Feb 2026] Title:Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions Authors:Mame Diarra Toure, David A. Stephens View a PDF of the paper titled Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions, by Mame Diarra Toure and 1 other authors View PDF HTML (experimental) Abstract:In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=\sigma_k^{2}/(2\mu_k)$, with $\mu_k{=}\mathbb{E}[p_k]$ and $\sigma_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/\mu_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image...

Related Articles

Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime