[2509.24228] Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Summary
This paper presents a benchmark for evaluating positive-unlabeled (PU) learning algorithms, addressing inconsistencies in experimental settings and proposing methods for fair comparisons.
Why It Matters
The evaluation of PU learning algorithms is crucial for advancing weakly supervised learning techniques. By establishing a standardized benchmark, this work aims to improve the reliability of performance comparisons, ultimately enhancing the development of more effective algorithms in machine learning.
Key Takeaways
- Introduces the first benchmark for evaluating PU learning algorithms.
- Identifies inconsistencies in existing experimental settings that hinder fair comparisons.
- Proposes calibration methods to address biases in model selection criteria.
- Highlights the differences between one-sample and two-sample settings in PU learning.
- Aims to provide a more accessible and realistic evaluation environment for future research.
Computer Science > Machine Learning arXiv:2509.24228 (cs) [Submitted on 29 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms Authors:Wei Wang, Dong-Dong Wu, Ming Li, Jingxiong Zhang, Gang Niu, Masashi Sugiyama View a PDF of the paper titled Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms, by Wei Wang and 5 other authors View PDF HTML (experimental) Abstract:Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other han...