[2504.16585] Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression
Summary
This paper explores how noisy manual labels can enhance variable selection in penalized logistic regression, proposing a novel algorithm that leverages this noise for improved model performance.
Why It Matters
In machine learning, the quality of labels is crucial for model accuracy. This research highlights an innovative approach to utilize label noise, traditionally viewed as a drawback, as a beneficial resource, potentially transforming practices in data annotation and model training.
Key Takeaways
- Label noise can be leveraged to improve variable selection in penalized logistic regression.
- The proposed algorithm ensures global convergence and is efficient for large-scale data.
- Experiments show significant performance improvements over conventional methods.
Computer Science > Machine Learning arXiv:2504.16585 (cs) [Submitted on 23 Apr 2025 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression Authors:Xiaofei Wu, Rongmei Liangse View a PDF of the paper titled Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression, by Xiaofei Wu and Rongmei Liangse View PDF HTML (experimental) Abstract:In large-scale supervised learning, penalized logistic regression (PLR) effectively mitigates overfitting through regularization, yet its performance critically depends on robust variable selection. This paper demonstrates that label noise introduced during manual annotation, often dismissed as a mere artifact, can serve as a valuable source of information to enhance variable selection in PLR. We theoretically show that such noise, intrinsically linked to classification difficulty, helps refine the estimation of non-zero coefficients compared to using only ground truth labels, effectively turning a common imperfection into a useful information resource. To efficiently leverage this form of information fusion in large-scale settings where data cannot be stored on a single machine, we propose a novel partition insensitive parallel algorithm based on the alternating direction method of multipliers (ADMM). ...