Ai Startups Machine Learning Data Science

[2602.15159] Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

arXiv - Machine Learning February 18, 2026 3 min read Article

Summary

The paper presents the Augmented-Intrinsic Dual-Masked Autoencoder (AID-MAE), a novel method for learning from incomplete electronic health records (EHRs) by directly addressing missing data challenges in clinical time series.

Why It Matters

This research is significant as it tackles the common issue of incomplete EHR data, which can hinder effective machine learning applications in healthcare. By improving representation learning from sparse data, AID-MAE can enhance clinical decision-making and patient stratification.

Key Takeaways

AID-MAE effectively learns from incomplete EHR time series without prior imputation.
The model uses dual masking to represent missing values and enhance reconstruction.
It outperforms existing methods like XGBoost and DuETT across various clinical tasks.
The learned embeddings can stratify patient cohorts, aiding in personalized medicine.
This approach addresses the challenges of irregular sampling and heterogeneous missingness in EHR data.

Computer Science > Machine Learning arXiv:2602.15159 (cs) [Submitted on 16 Feb 2026] Title:Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding Authors:Xiao Xiang, David Restrepo, Hyewon Jeong, Yugang Jia, Leo Anthony Celi View a PDF of the paper titled Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding, by Xiao Xiang and 4 other authors View PDF HTML (experimental) Abstract:Learning from electronic health records (EHRs) time series is challenging due to irregular sam- pling, heterogeneous missingness, and the resulting sparsity of observations. Prior self-supervised meth- ods either impute before learning, represent missingness through a dedicated input signal, or optimize solely for imputation, reducing their capacity to efficiently learn representations that support clinical downstream tasks. We propose the Augmented-Intrinsic Dual-Masked Autoencoder (AID-MAE), which learns directly from incomplete time series by applying an intrinsic missing mask to represent naturally missing values and an augmented mask that hides a subset of observed values for reconstruction during training. AID-MAE processes only the unmasked subset of tokens and consistently outperforms strong baselines, including XGBoost and DuETT, across multiple clinical tasks on two datasets. In addition, the learned embeddings naturally stratify patient cohorts in the representation space. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602...

Read Original Article

[2602.15159] Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

Summary

Why It Matters

Key Takeaways

Related Articles

This AI startup envisions 100 Million New People Making Videogames

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Anthropic ramps up its political activities with a new PAC | TechCrunch

Anthropic buys biotech startup Coefficient Bio in $400M deal: Reports | TechCrunch

No comments

Stay updated with AI News