[2602.21509] Fair Model-based Clustering
Summary
The paper presents Fair Model-based Clustering (FMC), a new algorithm that enhances fairness in clustering by ensuring the proportion of sensitive attributes in clusters mirrors that of the overall dataset, while being scalable and applicable to non-metric data.
Why It Matters
Fairness in machine learning is crucial for ethical AI applications. This paper addresses a significant limitation in existing clustering algorithms, making FMC a valuable contribution for researchers and practitioners focused on equitable data analysis and model training.
Key Takeaways
- FMC allows for scalable clustering without increasing the number of learnable parameters with sample size.
- The algorithm can handle non-metric data, broadening its applicability.
- Empirical and theoretical justifications support the effectiveness of FMC in achieving fairness.
Statistics > Machine Learning arXiv:2602.21509 (stat) [Submitted on 25 Feb 2026] Title:Fair Model-based Clustering Authors:Jinwon Park, Kunwoong Kim, Jihu Lee, Yongdai Kim View a PDF of the paper titled Fair Model-based Clustering, by Jinwon Park and 3 other authors View PDF HTML (experimental) Abstract:The goal of fair clustering is to find clusters such that the proportion of sensitive attributes (e.g., gender, race, etc.) in each cluster is similar to that of the entire dataset. Various fair clustering algorithms have been proposed that modify standard K-means clustering to satisfy a given fairness constraint. A critical limitation of several existing fair clustering algorithms is that the number of parameters to be learned is proportional to the sample size because the cluster assignment of each datum should be optimized simultaneously with the cluster center, and thus scaling up the algorithms is difficult. In this paper, we propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC). A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily. In particular, mini-batch learning is possible to obtain clusters that are approximately fair. Moreover, FMC can be applied to non-metric data (e.g., categorical data) as long as the likelihood is well-defined. Theoretical and empirical justifications for the superiority of the proposed algorithm are p...