[2406.16227] VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data
About this article
Abstract page for arXiv paper 2406.16227: VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data
Statistics > Machine Learning arXiv:2406.16227 (stat) [Submitted on 23 Jun 2024 (v1), last revised 1 Mar 2026 (this version, v2)] Title:VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data Authors:Jackie Rao, Paul D. W. Kirk View a PDF of the paper titled VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data, by Jackie Rao and Paul D. W. Kirk View PDF HTML (experimental) Abstract:Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratifiction of patients or samples. However, the growth in availability of high-dimensional categorical data, including `omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in term of efficiency, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarisation and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas (TCGA), showing...