[2602.01379] WAKESET: A Large-Scale, High-Reynolds Number Flow Dataset for Machine Learning of Turbulent Wake Dynamics
Summary
The paper introduces WAKESET, a large-scale dataset for machine learning focused on turbulent wake dynamics, addressing the need for high-fidelity data in fluid dynamics.
Why It Matters
WAKESET fills a critical gap in the availability of high-quality datasets for training machine learning models in fluid dynamics, particularly for complex, high-Reynolds number flows. This dataset can significantly enhance the development of predictive models and improve engineering applications in turbulent environments.
Key Takeaways
- WAKESET provides over 4,000 high-fidelity simulations of turbulent flows.
- The dataset focuses on practical engineering problems, enhancing model training for real-world applications.
- High-Reynolds number data is crucial for developing robust machine learning models in fluid dynamics.
Physics > Fluid Dynamics arXiv:2602.01379 (physics) [Submitted on 1 Feb 2026 (v1), last revised 22 Feb 2026 (this version, v2)] Title:WAKESET: A Large-Scale, High-Reynolds Number Flow Dataset for Machine Learning of Turbulent Wake Dynamics Authors:Zachary Cooper-Baldock, Paulo E. Santos, Russell S.A. Brinkworth, Karl Sammut View a PDF of the paper titled WAKESET: A Large-Scale, High-Reynolds Number Flow Dataset for Machine Learning of Turbulent Wake Dynamics, by Zachary Cooper-Baldock and Paulo E. Santos and Russell S.A. Brinkworth and Karl Sammut View PDF HTML (experimental) Abstract:Machine learning (ML) offers transformative potential for computational fluid dynamics (CFD), promising to accelerate simulations, improve turbulence modelling, and enable real-time flow prediction and control-capabilities that could fundamentally change how engineers approach fluid dynamics problems. However, the exploration of ML in fluid dynamics is critically hampered by the scarcity of large, diverse, and high-fidelity datasets suitable for training robust models. This limitation is particularly acute for highly turbulent flows, which dominate practical engineering applications yet remain computationally prohibitive to simulate at scale. High-Reynolds number turbulent datasets are essential for ML models to learn the complex, multi-scale physics characteristic of real-world flows, enabling generalisation beyond the simplified, low-Reynolds number regimes often represented in existing dat...