Nlp Machine Learning Data Science

[2504.16585] Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression

arXiv - Machine Learning February 16, 2026 4 min read Article

Summary

This paper explores how noisy manual labels can enhance variable selection in penalized logistic regression, proposing a novel algorithm that leverages this noise for improved model performance.

Why It Matters

In machine learning, the quality of labels is crucial for model accuracy. This research highlights an innovative approach to utilize label noise, traditionally viewed as a drawback, as a beneficial resource, potentially transforming practices in data annotation and model training.

Key Takeaways

Label noise can be leveraged to improve variable selection in penalized logistic regression.
The proposed algorithm ensures global convergence and is efficient for large-scale data.
Experiments show significant performance improvements over conventional methods.

Computer Science > Machine Learning arXiv:2504.16585 (cs) [Submitted on 23 Apr 2025 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression Authors:Xiaofei Wu, Rongmei Liangse View a PDF of the paper titled Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression, by Xiaofei Wu and Rongmei Liangse View PDF HTML (experimental) Abstract:In large-scale supervised learning, penalized logistic regression (PLR) effectively mitigates overfitting through regularization, yet its performance critically depends on robust variable selection. This paper demonstrates that label noise introduced during manual annotation, often dismissed as a mere artifact, can serve as a valuable source of information to enhance variable selection in PLR. We theoretically show that such noise, intrinsically linked to classification difficulty, helps refine the estimation of non-zero coefficients compared to using only ground truth labels, effectively turning a common imperfection into a useful information resource. To efficiently leverage this form of information fusion in large-scale settings where data cannot be stored on a single machine, we propose a novel partition insensitive parallel algorithm based on the alternating direction method of multipliers (ADMM). ...

Read Original Article

Llms

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

BANKING77 (77 fine-grained banking intents) is a well-established but increasingly saturated intent classification benchmark. did this wh...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

94.42% on BANKING77 Official Test Split — New Strong 2nd Place with Lightweight Embedding + Rerank (no 7B LLM)

94.42% Accuracy on Banking77 Official Test Split BANKING77-77 is deceptively hard: 77 fine-grained banking intents, noisy real-world quer...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Nlp

Built a Hybrid NAS tool for RNN architectures (HyNAS-R) – Looking for feedback for my final year evaluation [R]

Hi everyone, I'm currently in the evaluation phase of my Final Year Project and am looking for feedback on the system I've built. It's ca...

Reddit - Machine Learning · 1 min · about 7 hours ago

Machine Learning

[D] ICML 26 - What to do with the zero follow-up questions

Hello everyone. I submitted my work to ICML 26 this year, and it got somewhat above average reviews. Now, in the rebuttal acknowledgment,...

Reddit - Machine Learning · 1 min · about 12 hours ago