[2602.15637] The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems
Summary
The paper discusses the 'Stationarity Bias' in time-series imputation, proposing a 'Stratified Stress-Test' to evaluate methods under different operational regimes, highlighting the limitations of traditional benchmarks.
Why It Matters
This research addresses a critical gap in evaluating time-series imputation methods, particularly in regulated systems where performance can vary significantly between stable and transient conditions. By formalizing the Stationarity Bias, it provides a more accurate framework for assessing model robustness, which is vital for applications in healthcare and industrial operations.
Key Takeaways
- Traditional benchmarks may misrepresent model performance due to Stationarity Bias.
- Linear interpolation can outperform complex models in stable conditions, indicating potential inefficiencies.
- Deep learning models are essential for maintaining accuracy during critical transient events.
Computer Science > Machine Learning arXiv:2602.15637 (cs) [Submitted on 17 Feb 2026] Title:The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems Authors:Amirreza Dolatpour Fathkouhi, Alireza Namazi, Heman Shakeri View a PDF of the paper titled The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems, by Amirreza Dolatpour Fathkouhi and 2 other authors View PDF HTML (experimental) Abstract:Time-series imputation benchmarks employ uniform random masking and shape-agnostic metrics (MSE, RMSE), implicitly weighting evaluation by regime prevalence. In systems with a dominant attractor -- homeostatic physiology, nominal industrial operation, stable network traffic -- this creates a systematic \emph{Stationarity Bias}: simple methods appear superior because the benchmark predominantly samples the easy, low-entropy regime where they trivially succeed. We formalize this bias and propose a \emph{Stratified Stress-Test} that partitions evaluation into Stationary and Transient regimes. Using Continuous Glucose Monitoring (CGM) as a testbed -- chosen for its rigorous ground-truth forcing functions (meals, insulin) that enable precise regime identification -- we establish three findings with broad implications:(i)~Stationary Efficiency: Linear interpolation achieves state-of-the-art reconstruction during stable intervals, confirming that complex architectures are computationally wastef...