[2603.26726] A Multimodal Deep Learning Framework for Edema Classification Using HCT and Clinical Data
About this article
Abstract page for arXiv paper 2603.26726: A Multimodal Deep Learning Framework for Edema Classification Using HCT and Clinical Data
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26726 (cs) [Submitted on 20 Mar 2026] Title:A Multimodal Deep Learning Framework for Edema Classification Using HCT and Clinical Data Authors:Aram Ansary Ogholbake, Hannah Choi, Spencer Brandenburg, Alyssa Antuna, Zahraa Al-Sharshahi, Makayla Cox, Haseeb Ahmed, Jacqueline Frank, Nathan Millson, Luke Bauerle, Jessica Lee, David Dornbos III, Qiang Cheng View a PDF of the paper titled A Multimodal Deep Learning Framework for Edema Classification Using HCT and Clinical Data, by Aram Ansary Ogholbake and 12 other authors View PDF HTML (experimental) Abstract:We propose AttentionMixer, a unified deep learning framework for multimodal detection of brain edema that combines structural head CT (HCT) with routine clinical metadata. While HCT provides rich spatial information, clinical variables such as age, laboratory values, and scan timing capture complementary context that might be ignored or naively concatenated. AttentionMixer is designed to fuse these heterogeneous sources in a principled and efficient manner. HCT volumes are first encoded using a self-supervised Vision Transformer Autoencoder (ViT-AE++), without requiring large labeled datasets. Clinical metadata are mapped into the same feature space and used as keys and values in a cross-attention module, where HCT-derived feature vector serves as queries. This cross-attention fusion allows the network to dynamically modulate imaging features based on pat...