[2501.18891] CAAT-EHR: Cross-Attentional Autoregressive Transformer for Multimodal Electronic Health Record Embeddings
Summary
The CAAT-EHR paper presents a novel Cross-Attentional Autoregressive Transformer model designed to generate generalizable embeddings from multimodal Electronic Health Records (EHRs), enhancing predictive accuracy in clinical tasks.
Why It Matters
This research addresses the limitations of existing EHR analysis methods that often focus on specific tasks, thereby hindering the development of versatile patient representations. By proposing a unified framework, CAAT-EHR aims to improve clinical decision-making and predictive analytics in healthcare.
Key Takeaways
- CAAT-EHR generates task-agnostic embeddings from multimodal EHR data.
- The model employs self-attention and cross-attention mechanisms to capture temporal dependencies and intermodal relationships.
- Significant performance improvements were observed in mortality prediction and other clinical tasks compared to raw EHR data.
- Ablation studies highlight the importance of cross-modality fusion and autoregressive refinement in the model's effectiveness.
- The framework supports the development of more reliable clinical decision support systems.
Computer Science > Machine Learning arXiv:2501.18891 (cs) [Submitted on 31 Jan 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:CAAT-EHR: Cross-Attentional Autoregressive Transformer for Multimodal Electronic Health Record Embeddings Authors:Mohammad Al Olaimat, Shaika Chowdhury, Serdar Bozdag View a PDF of the paper titled CAAT-EHR: Cross-Attentional Autoregressive Transformer for Multimodal Electronic Health Record Embeddings, by Mohammad Al Olaimat and 2 other authors View PDF HTML (experimental) Abstract:Electronic Health Records (EHRs) contain rich, longitudinal patient information across structured (e.g., labs, vitals, and imaging) and unstructured (e.g., clinical notes) modalities. While deep learning models such as RNNs and Transformers have advanced single- and multimodal EHR analysis, existing methods often optimize for specific downstream tasks and overlook the creation of generalizable patient representations that can be reused across multiple tasks. To address this gap, we propose CAAT-EHR, a novel Cross-Attentional Autoregressive Transformer architecture that produces task-agnostic, longitudinal embeddings of multimodal EHR data. In CAAT-EHR, self-attention layers capture temporal dependencies within each modality, while cross-attention layers fuse information across modalities to model complex interrelationships. During pre-training, an autoregressive decoder predicts future time steps from the fused embeddings, enforcing temporal consistency an...