[2601.11670] A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning
Summary
This paper presents a novel Confidence-Variance (CoVar) theory for pseudo-label selection in semi-supervised learning, addressing the limitations of fixed confidence thresholds in deep learning models.
Why It Matters
The research is significant as it proposes a more reliable method for pseudo-label selection, which is crucial for improving the performance of semi-supervised learning models. By combining confidence with residual-class variance, the study offers a solution to the common issue of overconfidence in predictions, thereby enhancing model accuracy across various datasets.
Key Takeaways
- Introduces a Confidence-Variance (CoVar) framework for pseudo-label selection.
- Combines maximum confidence with residual-class variance for improved reliability.
- Demonstrates that high-confidence predictions can still be incorrect, necessitating a more nuanced approach.
- Proposes a threshold-free selection mechanism to enhance prediction reliability.
- Shows consistent performance improvements across multiple datasets.
Computer Science > Machine Learning arXiv:2601.11670 (cs) [Submitted on 16 Jan 2026 (v1), last revised 26 Feb 2026 (this version, v2)] Title:A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning Authors:Jinshi Liu, Pan Liu, Lei He View a PDF of the paper titled A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning, by Jinshi Liu and 2 other authors View PDF HTML (experimental) Abstract:Most pseudo-label selection strategies in semi-supervised learning rely on fixed confidence thresholds, implicitly assuming that prediction confidence reliably indicates correctness. In practice, deep networks are often overconfident: high-confidence predictions can still be wrong, while informative low-confidence samples near decision boundaries are discarded. This paper introduces a Confidence-Variance (CoVar) theory framework that provides a principled joint reliability criterion for pseudo-label selection. Starting from the entropy minimization principle, we derive a reliability measure that combines maximum confidence (MC) with residual-class variance (RCV), which characterizes how probability mass is distributed over non-maximum classes. The derivation shows that reliable pseudo-labels should have both high MC and low RCV, and that the influence of RCV increases as confidence grows, thereby correcting overconfident but unstable predictions. From this perspective, we cast pseudo-label selection as a spectral relaxation problem ...