Machine Learning Nlp Data Science

[2505.22554] A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

arXiv - Machine Learning February 25, 2026 4 min read Article

Summary

This article presents a novel copula-based supervised filter for feature selection in diabetes risk prediction, demonstrating improved efficiency and interpretability in machine learning models.

Why It Matters

Effective feature selection is crucial in medical predictive modeling, particularly for diabetes risk. This study introduces a method that enhances the identification of significant predictors, especially in extreme patient strata, which can lead to better clinical outcomes and more accurate risk assessments.

Key Takeaways

Introduces a copula-based method for feature selection in diabetes risk prediction.
Demonstrates improved efficiency by reducing features while maintaining predictive power.
Highlights the importance of focusing on predictors in the distribution tails for better model performance.
Compares favorably against standard feature selection methods like Mutual Information and ReliefF.
Provides a clinically coherent approach that can complement existing methods in public health.

Statistics > Machine Learning arXiv:2505.22554 (stat) [Submitted on 28 May 2025 (v1), last revised 24 Feb 2026 (this version, v5)] Title:A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning Authors:Agnideep Aich, Md Monzur Murshed, Sameera Hewage, Amanda Mayeaux View a PDF of the paper titled A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning, by Agnideep Aich and 2 other authors View PDF HTML (experimental) Abstract:Effective feature selection is critical for robust and interpretable predictive modeling in medicine, especially when risk factors matter most in extreme patient strata. Many standard selectors emphasize average associations and can miss predictors whose relevance is concentrated in the distribution tails. We propose a computationally efficient supervised filter based on a Gumbel-copula implied upper-tail concordance score (lambda U), defined as a monotone transformation of Kendall's tau, to rank features by their tendency to be simultaneously extreme with the positive class. We compare against four common baselines (Mutual Information, mRMR, ReliefF, and L1/Elastic-Net) across four classifiers on two diabetes datasets: a large-scale public health survey (CDC, N=253,680) and a clinical benchmark (PIMA, N=768). Analyses include statistical testing, permutation importance, and robustness checks. On CDC, the proposed selector is the fastest and reduces 21 f...

Read Original Article

[2505.22554] A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

Summary

Why It Matters

Key Takeaways

Related Articles

PSA: Anyone with a link can view your Granola notes by default | The Verge

[D] On-Device Real-Time Visibility Restoration: Deterministic CV vs. Quantized ML Models. Looking for insights on Edge Preservation vs. Latency.

[R] Is autoresearch really better than classic hyperparameter tuning?

[R] Solving the Jane Street Dormant LLM Challenge: A Systematic Approach to Backdoor Discovery

No comments

Stay updated with AI News