[2512.20363] Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning
Summary
The paper presents Clust-PSI-PFL, a novel framework for personalized federated learning that addresses challenges posed by non-IID data through a Population Stability Index approach, enhancing model accuracy and client fairness.
Why It Matters
As federated learning becomes increasingly important for privacy-preserving AI, addressing non-IID data is crucial for improving model performance and equity among clients. This research offers a significant advancement in clustering techniques that can enhance federated learning applications across various domains.
Key Takeaways
- Clust-PSI-PFL uses a Population Stability Index to quantify non-IID data in federated learning.
- The framework improves global accuracy by up to 18% compared to existing methods.
- Client fairness is enhanced by 37% under severe non-IID conditions.
- A systematic silhouette-based procedure is employed to determine optimal client clusters.
- The method is applicable across diverse datasets, including tabular, image, and text.
Computer Science > Machine Learning arXiv:2512.20363 (cs) [Submitted on 23 Dec 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning Authors:Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, David Solans, Mohammed Elbamby, Nicolas Kourtellis, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti View a PDF of the paper titled Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning, by Daniel M. Jimenez-Gutierrez and 7 other authors View PDF HTML (experimental) Abstract:Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distributed (non-IID) data across clients biases updates and degrades performance. To alleviate these issues, we propose Clust-PSI-PFL, a clustering-based personalized FL framework that uses the Population Stability Index (PSI) to quantify the level of non-IID data. We compute a weighted PSI metric, $WPSI^L$, which we show to be more informative than common non-IID metrics (Hellinger, Jensen-Shannon, and Earth Mover's distance). Using PSI features, we form distributionally homogeneous groups of clients via K-means++; the number of optimal clusters is chosen by a systematic silhouette-based procedure, typically yielding few clusters with modest overhead. Acros...