[2509.24228] Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

[2509.24228] Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a benchmark for evaluating positive-unlabeled (PU) learning algorithms, addressing inconsistencies in experimental settings and proposing methods for fair comparisons.

Why It Matters

The evaluation of PU learning algorithms is crucial for advancing weakly supervised learning techniques. By establishing a standardized benchmark, this work aims to improve the reliability of performance comparisons, ultimately enhancing the development of more effective algorithms in machine learning.

Key Takeaways

  • Introduces the first benchmark for evaluating PU learning algorithms.
  • Identifies inconsistencies in existing experimental settings that hinder fair comparisons.
  • Proposes calibration methods to address biases in model selection criteria.
  • Highlights the differences between one-sample and two-sample settings in PU learning.
  • Aims to provide a more accessible and realistic evaluation environment for future research.

Computer Science > Machine Learning arXiv:2509.24228 (cs) [Submitted on 29 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms Authors:Wei Wang, Dong-Dong Wu, Ming Li, Jingxiong Zhang, Gang Niu, Masashi Sugiyama View a PDF of the paper titled Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms, by Wei Wang and 5 other authors View PDF HTML (experimental) Abstract:Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other han...

Related Articles

Machine Learning

[D] How do ML engineers view vibe coding?

I've seen, read and heard a lot of mixed reactions about software engineers (ie. the ones who aren't building ML models and make purely d...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] I built a simple gpu-aware single-node job scheduler for researchers / students

(reposting in my main account because anonymous account cannot post here.) Hi everyone! I’m a research engineer from a small lab in Asia,...

Reddit - Machine Learning · 1 min ·
Llms

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

Posting this for a friend who isn't on Reddit. A recent graduate, entry level, no commercial production experience but spent the past yea...

Reddit - ML Jobs · 1 min ·
Machine Learning

The end of AI

I am a computer science student graduating this year, as far as ai is concerned my knowledge is fairly limited and fairly high level i kn...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime