[2603.01863] Tide: A Customisable Dataset Generator for Anti-Money Laundering Research
About this article
Abstract page for arXiv paper 2603.01863: Tide: A Customisable Dataset Generator for Anti-Money Laundering Research
Computer Science > Machine Learning arXiv:2603.01863 (cs) [Submitted on 2 Mar 2026] Title:Tide: A Customisable Dataset Generator for Anti-Money Laundering Research Authors:Montijn van den Beukel, Jože Martin Rožanec, Ana-Lucia Varbanescu View a PDF of the paper titled Tide: A Customisable Dataset Generator for Anti-Money Laundering Research, by Montijn van den Beukel and 2 other authors View PDF HTML (experimental) Abstract:The lack of accessible transactional data significantly hinders machine learning research for Anti-Money Laundering (AML). Privacy and legal concerns prevent the sharing of real financial data, while existing synthetic generators focus on simplistic structural patterns and neglect the temporal dynamics (timing and frequency) that characterise sophisticated laundering schemes. We present Tide, an open-source synthetic dataset generator that produces graph-based financial networks incorporating money laundering patterns defined by both structural and temporal characteristics. Tide enables reproducible, customisable dataset generation tailored to specific research needs. We release two reference datasets with varying illicit ratios (LI: 0.10\%, HI: 0.19\%), alongside the implementation of state-of-the-art detection models. Evaluation across these datasets reveals condition-dependent model rankings: LightGBM achieves the highest PR-AUC (78.05) in the low illicit ratio condition, while XGBoost performs best (85.12) at higher fraud prevalence. These divergent...