[2602.16794] Beyond Procedure: Substantive Fairness in Conformal Prediction
Summary
This paper explores substantive fairness in conformal prediction, analyzing its impact on downstream decision-making and proposing methods to enhance fairness in machine learning models.
Why It Matters
As machine learning systems increasingly influence critical decisions, ensuring fairness in their predictions is vital. This research addresses the gap in understanding how conformal prediction can be adapted to promote substantive fairness, which is crucial for equitable outcomes in various applications.
Key Takeaways
- Conformal prediction (CP) can enhance fairness in decision-making processes.
- The study introduces an LLM-in-the-loop evaluator for assessing substantive fairness.
- Label-clustered CP variants show improved fairness outcomes compared to traditional methods.
- Equalized set sizes correlate strongly with enhanced substantive fairness.
- The research provides a theoretical framework for understanding fairness disparities in predictions.
Statistics > Machine Learning arXiv:2602.16794 (stat) [Submitted on 18 Feb 2026] Title:Beyond Procedure: Substantive Fairness in Conformal Prediction Authors:Pengqi Liu, Zijun Yu, Mouloud Belbahri, Arthur Charpentier, Masoud Asgharian, Jesse C. Cresswell View a PDF of the paper titled Beyond Procedure: Substantive Fairness in Conformal Prediction, by Pengqi Liu and 5 other authors View PDF Abstract:Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream outcomes. Theoretically, we derive an upper bound that decomposes prediction-set size disparity into interpretable components, clarifying how label-clustered CP helps control method-driven contributions to unfairness. To facilitate scalable empirical analysis, we introduce an LLM-in-the-loop evaluator that approximates human assessment of substantive fairness across diverse modalities. Our experiments reveal that label-clustered CP variants consistently deliver superior substantive fairness. Finally, we empirically show that equalized set sizes, rather than coverage, strongly correlate with improved substantive fairness, enabling practitioners to design more fair CP systems. Our code is available at this https URL. Subje...