[2602.20329] CaDrift: A Time-dependent Causal Generator of Drifting Data Streams

arXiv - Machine Learning February 25, 2026 3 min read Article

Summary

The paper introduces CaDrift, a synthetic data generator that simulates time-dependent causal shifts in data streams, enhancing evaluation of machine learning models under evolving conditions.

Why It Matters

As data environments become increasingly dynamic, tools like CaDrift are crucial for developing robust machine learning models. This framework allows researchers to simulate various data shifts, enabling better understanding and adaptation of algorithms to real-world scenarios.

Key Takeaways

CaDrift generates synthetic data streams with controlled time-dependent shifts.
The framework utilizes Structural Causal Models to simulate causal relationships.
Experimental results demonstrate the effectiveness of CaDrift in evaluating classifier performance under data shifts.
CaDrift is available on GitHub for further research and application.
The tool addresses the growing need for models that can adapt to evolving data conditions.

Computer Science > Machine Learning arXiv:2602.20329 (cs) [Submitted on 23 Feb 2026] Title:CaDrift: A Time-dependent Causal Generator of Drifting Data Streams Authors:Eduardo V. L. Barboza, Jean Paul Barddal, Robert Sabourin, Rafael M. O. Cruz View a PDF of the paper titled CaDrift: A Time-dependent Causal Generator of Drifting Data Streams, by Eduardo V. L. Barboza and 3 other authors View PDF HTML (experimental) Abstract:This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub. Comments: Subjects: Machine Learning (cs.LG); Databases (cs.DB) Cite as: arXiv:2602.20329 [cs.LG] (or arXiv:2602.20329v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.20...

Read Original Article

[2602.20329] CaDrift: A Time-dependent Causal Generator of Drifting Data Streams

Summary

Why It Matters

Key Takeaways

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

wtf bro did what? arc 3 2026

Study: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

No comments

Stay updated with AI News