[2411.01685] Reducing Biases in Record Matching Through Scores Calibration
Summary
This paper explores methods to reduce biases in record matching through score calibration, proposing two model-agnostic post-processing techniques that align score distributions to enhance fairness without retraining models.
Why It Matters
Bias in record matching can lead to unfair outcomes in various applications, including hiring and credit scoring. This research provides innovative solutions to mitigate score bias, promoting fairness in machine learning systems and ensuring equitable treatment across different demographic groups.
Key Takeaways
- Introduces a threshold-independent metric for assessing score bias in record matching.
- Proposes two calibration methods (Calib and C-Calib) to reduce score bias without retraining models.
- Demonstrates substantial bias reduction with minimal accuracy loss across various benchmarks.
Computer Science > Machine Learning arXiv:2411.01685 (cs) [Submitted on 3 Nov 2024 (v1), last revised 22 Feb 2026 (this version, v3)] Title:Reducing Biases in Record Matching Through Scores Calibration Authors:Mohammad Hossein Moslemi, Mostafa Milani View a PDF of the paper titled Reducing Biases in Record Matching Through Scores Calibration, by Mohammad Hossein Moslemi and 1 other authors View PDF HTML (experimental) Abstract:Record matching models typically output a real-valued matching score that is later consumed through thresholding, ranking, or human review. While fairness in record matching has mostly been assessed using binary decisions at a fixed threshold, such evaluations can miss systematic disparities in the entire score distribution and can yield conclusions that change with the chosen threshold. We introduce a threshold-independent notion of score bias that extends standard group-fairness criteria-demographic parity (DP), equal opportunity (EO), and equalized odds (EOD)-from binary outputs to score functions by integrating group-wise metric gaps over all thresholds. Using this metric, we empirically show that several state-of-the-art deep matchers can exhibit substantial score bias even when appearing fair at commonly used thresholds. To mitigate these disparities without retraining the underlying matcher, we propose two model-agnostic post-processing methods that only require score evaluations on an (unlabeled) calibration set. Calib targets DP by aligning ...