[2601.11670] A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning

arXiv - Machine Learning February 27, 2026 4 min read Article

Summary

This paper presents a novel Confidence-Variance (CoVar) theory for pseudo-label selection in semi-supervised learning, addressing the limitations of fixed confidence thresholds in deep learning models.

Why It Matters

The research is significant as it proposes a more reliable method for pseudo-label selection, which is crucial for improving the performance of semi-supervised learning models. By combining confidence with residual-class variance, the study offers a solution to the common issue of overconfidence in predictions, thereby enhancing model accuracy across various datasets.

Key Takeaways

Introduces a Confidence-Variance (CoVar) framework for pseudo-label selection.
Combines maximum confidence with residual-class variance for improved reliability.
Demonstrates that high-confidence predictions can still be incorrect, necessitating a more nuanced approach.
Proposes a threshold-free selection mechanism to enhance prediction reliability.
Shows consistent performance improvements across multiple datasets.

Computer Science > Machine Learning arXiv:2601.11670 (cs) [Submitted on 16 Jan 2026 (v1), last revised 26 Feb 2026 (this version, v2)] Title:A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning Authors:Jinshi Liu, Pan Liu, Lei He View a PDF of the paper titled A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning, by Jinshi Liu and 2 other authors View PDF HTML (experimental) Abstract:Most pseudo-label selection strategies in semi-supervised learning rely on fixed confidence thresholds, implicitly assuming that prediction confidence reliably indicates correctness. In practice, deep networks are often overconfident: high-confidence predictions can still be wrong, while informative low-confidence samples near decision boundaries are discarded. This paper introduces a Confidence-Variance (CoVar) theory framework that provides a principled joint reliability criterion for pseudo-label selection. Starting from the entropy minimization principle, we derive a reliability measure that combines maximum confidence (MC) with residual-class variance (RCV), which characterizes how probability mass is distributed over non-maximum classes. The derivation shows that reliable pseudo-labels should have both high MC and low RCV, and that the influence of RCV increases as confidence grows, thereby correcting overconfident but unstable predictions. From this perspective, we cast pseudo-label selection as a spectral relaxation problem ...

Read Original Article

[2601.11670] A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning

Summary

Why It Matters

Key Takeaways

Related Articles

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

World models will be the next big thing, bye-bye LLMs

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

No comments

Stay updated with AI News