[2602.20329] CaDrift: A Time-dependent Causal Generator of Drifting Data Streams
Summary
The paper introduces CaDrift, a synthetic data generator that simulates time-dependent causal shifts in data streams, enhancing evaluation of machine learning models under evolving conditions.
Why It Matters
As data environments become increasingly dynamic, tools like CaDrift are crucial for developing robust machine learning models. This framework allows researchers to simulate various data shifts, enabling better understanding and adaptation of algorithms to real-world scenarios.
Key Takeaways
- CaDrift generates synthetic data streams with controlled time-dependent shifts.
- The framework utilizes Structural Causal Models to simulate causal relationships.
- Experimental results demonstrate the effectiveness of CaDrift in evaluating classifier performance under data shifts.
- CaDrift is available on GitHub for further research and application.
- The tool addresses the growing need for models that can adapt to evolving data conditions.
Computer Science > Machine Learning arXiv:2602.20329 (cs) [Submitted on 23 Feb 2026] Title:CaDrift: A Time-dependent Causal Generator of Drifting Data Streams Authors:Eduardo V. L. Barboza, Jean Paul Barddal, Robert Sabourin, Rafael M. O. Cruz View a PDF of the paper titled CaDrift: A Time-dependent Causal Generator of Drifting Data Streams, by Eduardo V. L. Barboza and 3 other authors View PDF HTML (experimental) Abstract:This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub. Comments: Subjects: Machine Learning (cs.LG); Databases (cs.DB) Cite as: arXiv:2602.20329 [cs.LG] (or arXiv:2602.20329v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.20...