[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations
Summary
This paper presents a novel statistical method for modeling irregular multivariate time series with missing data, demonstrating superior performance over complex deep learning models in biomedical applications.
Why It Matters
The study addresses a critical challenge in predictive modeling, particularly in healthcare, where missing observations can hinder analysis. By proposing a simpler, time-agnostic approach, the authors provide an efficient alternative that enhances interpretability and performance, making it relevant for researchers and practitioners in data science and machine learning.
Key Takeaways
- Introduces a statistical method for handling irregular multivariate time series with missing values.
- Achieves state-of-the-art performance on biomedical datasets, outperforming complex models.
- Demonstrates that feature extraction is more impactful than classifier choice for performance gains.
- Identifies scenarios where missing patterns can provide predictive signals.
- Offers a more interpretable and computationally efficient solution for time series classification.
Computer Science > Machine Learning arXiv:2602.19531 (cs) [Submitted on 23 Feb 2026] Title:A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations Authors:Dingyi Nie, Yixing Wu, C.-C. Jay Kuo View a PDF of the paper titled A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations, by Dingyi Nie and 2 other authors View PDF HTML (experimental) Abstract:Irregular multivariate time series with missing values present significant challenges for predictive modeling in domains such as healthcare. While deep learning approaches often focus on temporal interpolation or complex architectures to handle irregularities, we propose a simpler yet effective alternative: extracting time-agnostic summary statistics to eliminate the temporal axis. Our method computes four key features per variable-mean and standard deviation of observed values, as well as the mean and variability of changes between consecutive observations to create a fixed-dimensional representation. These features are then utilized with standard classifiers, such as logistic regression and XGBoost. Evaluated on four biomedical datasets (PhysioNet Challenge 2012, 2019, PAMAP2, and MIMIC-III), our approach achieves state-of-the-art performance, surpassing recent transformer and graph-based models by 0.5-1.7% in AUROC/AUPRC and 1.1-1.7% in accuracy/F1-score, while reducing computational complexity. Ablation studies demonstrate that feature extract...