[2603.25923] Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework
About this article
Abstract page for arXiv paper 2603.25923: Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework
Computer Science > Machine Learning arXiv:2603.25923 (cs) [Submitted on 26 Mar 2026] Title:Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework Authors:Yixin Zhou, Zhixiang Liu, Vladimir I. Zadorozhny, Jonathan Elmer View a PDF of the paper titled Preventing Data Leakage in EEG-Based Survival Prediction: A Two-Stage Embedding and Transformer Framework, by Yixin Zhou and 3 other authors View PDF HTML (experimental) Abstract:Deep learning models have shown promise in EEG-based outcome prediction for comatose patients after cardiac arrest, but their reliability is often compromised by subtle forms of data leakage. In particular, when long EEG recordings are segmented into short windows and reused across multiple training stages, models may implicitly encode and propagate label information, leading to overly optimistic validation performance and poor generalization. In this study, we identify a previously overlooked form of data leakage in multi-stage EEG modeling pipelines. We demonstrate that violating strict patient-level separation can significantly inflate validation metrics while causing substantial degradation on independent test data. To address this issue, we propose a leakage-aware two-stage framework. In the first stage, short EEG segments are transformed into embedding representations using a convolutional neural network with an ArcFace objective. In the second stage, a Transformer-based model aggregates these em...