[2602.23994] MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening
About this article
Abstract page for arXiv paper 2602.23994: MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening
Computer Science > Machine Learning arXiv:2602.23994 (cs) [Submitted on 27 Feb 2026] Title:MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening Authors:Vrushank Ahire, Yogesh Kumar, Anouck Girard, M. A. Ganaie View a PDF of the paper titled MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening, by Vrushank Ahire and 3 other authors View PDF HTML (experimental) Abstract:Alzheimer's disease is a progressive neurodegenerative disorder in which mild cognitive impairment (MCI) marks a critical transition between aging and dementia. Neuroimaging modalities, such as structural MRI, provide biomarkers of this transition; however, their high costs and infrastructure needs limit their deployment at a population scale. Speech analysis offers a non-invasive alternative, but speech-only classifiers are developed independently of neuroimaging, leaving decision boundaries biologically ungrounded and limiting reliability on the subtle CN-versus-MCI distinction. We propose MINT (Multimodal Imaging-to-Speech Knowledge Transfer), a three-stage cross-modal framework that transfers biomarker structure from MRI into a speech encoder at training time. An MRI teacher, trained on 1,228 subjects, defines a compact neuroimaging embedding space for CN-versus-MCI classification. A residual projection head aligns speech representations to this frozen imaging manifold via a combined geometric loss, adapting speech to the learned biomarker ...