[2505.04733] Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
Summary
This paper presents a framework for robust uncertainty quantification in machine learning when training data is corrupted. It introduces methods for re-weighting data and imputing labels to maintain valid predictions.
Why It Matters
As machine learning models increasingly rely on labeled data, the presence of corrupted labels can significantly impact their performance. This research addresses a critical challenge in ensuring reliable predictions under such conditions, making it relevant for practitioners and researchers in the field of machine learning.
Key Takeaways
- Introduces a framework for uncertainty quantification in corrupted label scenarios.
- Presents the Privileged Conformal Prediction (PCP) method for re-weighting data.
- Demonstrates that valid predictions can be achieved even with poorly estimated weights.
- Introduces Uncertain Imputation (UI) to handle corrupted labels without weight reliance.
- Validates the proposed methods through empirical testing on synthetic and real datasets.
Computer Science > Machine Learning arXiv:2505.04733 (cs) [Submitted on 7 May 2025 (v1), last revised 26 Feb 2026 (this version, v3)] Title:Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting Authors:Shai Feldman, Stephen Bates, Yaniv Romano View a PDF of the paper titled Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting, by Shai Feldman and 2 other authors View PDF HTML (experimental) Abstract:We introduce a framework for robust uncertainty quantification in situations where labeled training data are corrupted, through noisy or missing labels. We build on conformal prediction, a statistical tool for generating prediction sets that cover the test label with a pre-specified probability. The validity of conformal prediction, however, holds under the i.i.d assumption, which does not hold in our setting due to the corruptions in the data. To account for this distribution shift, the privileged conformal prediction (PCP) method proposed leveraging privileged information (PI) -- additional features available only during training -- to re-weight the data distribution, yielding valid prediction sets under the assumption that the weights are accurate. In this work, we analyze the robustness of PCP to inaccuracies in the weights. Our analysis indicates that PCP can still yield valid uncertainty estimates even when the weights are poorly estimated. Furthermore, we introduce uncertain imputation (UI), a new con...