[2602.21154] CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning
Summary
The paper presents CG-DMER, a hybrid framework for enhanced ECG representation learning, addressing intra- and inter-modality challenges in multimodal data integration.
Why It Matters
Accurate ECG interpretation is vital for diagnosing cardiovascular diseases. This research introduces CG-DMER, which improves the integration of ECG signals with clinical reports, potentially leading to better diagnostic outcomes and advancements in cardiovascular healthcare.
Key Takeaways
- CG-DMER addresses intra-modality challenges by capturing spatial-temporal dependencies in ECG data.
- The framework mitigates inter-modality biases through a representation disentanglement strategy.
- Experiments show CG-DMER achieves state-of-the-art performance on multiple ECG datasets.
- The proposed methods enhance the modeling of fine-grained diagnostic patterns.
- This research could lead to improved diagnostic tools in cardiovascular medicine.
Computer Science > Artificial Intelligence arXiv:2602.21154 (cs) [Submitted on 24 Feb 2026] Title:CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning Authors:Ziwei Niu, Hao Sun, Shujun Bian, Xihong Yang, Lanfen Lin, Yuxin Liu, Yueming Jin View a PDF of the paper titled CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning, by Ziwei Niu and Hao Sun and Shujun Bian and Xihong Yang and Lanfen Lin and Yuxin Liu and Yueming Jin View PDF HTML (experimental) Abstract:Accurate interpretation of electrocardiogram (ECG) signals is crucial for diagnosing cardiovascular diseases. Recent multimodal approaches that integrate ECGs with accompanying clinical reports show strong potential, but they still face two main concerns from a modality perspective: (1) intra-modality: existing models process ECGs in a lead-agnostic manner, overlooking spatial-temporal dependencies across leads, which restricts their effectiveness in modeling fine-grained diagnostic patterns; (2) inter-modality: existing methods directly align ECG signals with clinical reports, introducing modality-specific biases due to the free-text nature of the reports. In light of these two issues, we propose CG-DMER, a contrastive-generative framework for disentangled multimodal ECG representation learning, powered by two key designs: (1) Spatial-temporal masked modeling is designed to better capture fine-grained temporal dyn...