[2602.19130] Detecting labeling bias using influence functions
Summary
This article explores the use of influence functions to detect labeling bias in datasets, demonstrating their effectiveness in identifying mislabeled samples in both the MNIST and CheXpert datasets.
Why It Matters
Labeling bias can significantly impact the fairness and accuracy of machine learning models. This research provides a method to detect such biases, which is crucial for developing equitable AI systems. By leveraging influence functions, the study addresses a critical challenge in data science and machine learning, enhancing model reliability and fairness.
Key Takeaways
- Influence functions can effectively identify mislabeled samples in training datasets.
- The study demonstrated nearly 90% accuracy in detecting label errors in the MNIST dataset.
- Mislabeled samples in the CheXpert dataset exhibited higher influence scores, indicating a consistent pattern.
- Addressing labeling bias is essential for improving the fairness of AI models.
- The proposed method can be applied to various datasets, enhancing model training integrity.
Computer Science > Machine Learning arXiv:2602.19130 (cs) [Submitted on 22 Feb 2026] Title:Detecting labeling bias using influence functions Authors:Frida Jørgensen, Nina Weng, Siavash Bigdeli View a PDF of the paper titled Detecting labeling bias using influence functions, by Frida J{\o}rgensen and 2 other authors View PDF HTML (experimental) Abstract:Labeling bias arises during data collection due to resource limitations or unconscious bias, leading to unequal label error rates across subgroups or misrepresentation of subgroup prevalence. Most fairness constraints assume training labels reflect the true distribution, rendering them ineffective when labeling bias is present; leaving a challenging question, that \textit{how can we detect such labeling bias?} In this work, we investigate whether influence functions can be used to detect labeling bias. Influence functions estimate how much each training sample affects a model's predictions by leveraging the gradient and Hessian of the loss function -- when labeling errors occur, influence functions can identify wrongly labeled samples in the training set, revealing the underlying failure mode. We develop a sample valuation pipeline and test it first on the MNIST dataset, then scaled to the more complex CheXpert medical imaging dataset. To examine label noise, we introduced controlled errors by flipping 20\% of the labels for one class in the dataset. Using a diagonal Hessian approximation, we demonstrated promising results, ...