[2602.12606] RelBench v2: A Large-Scale Benchmark and Repository for Relational Data
Summary
RelBench v2 introduces a comprehensive benchmark for relational deep learning, featuring 11 datasets and new predictive tasks, enhancing evaluation methods in relational data modeling.
Why It Matters
As relational deep learning evolves, having robust benchmarks like RelBench v2 is crucial for systematic evaluation and advancement in the field. This resource aids researchers in assessing model performance across diverse relational datasets, fostering innovation and improving applications in various sectors, from academia to enterprise.
Key Takeaways
- RelBench v2 expands to 11 datasets with over 22 million rows, enhancing relational data evaluation.
- Introduces autocomplete tasks for predicting missing values in relational tables, moving beyond traditional SQL forecasting.
- Demonstrates that relational deep learning models outperform single-table baselines in various predictive tasks.
Computer Science > Machine Learning arXiv:2602.12606 (cs) [Submitted on 13 Feb 2026] Title:RelBench v2: A Large-Scale Benchmark and Repository for Relational Data Authors:Justin Gu, Rishabh Ranjan, Charilaos Kanatsoulis, Haiming Tang, Martin Jurkovic, Valter Hudovernik, Mark Znidar, Pranshu Chaturvedi, Parth Shroff, Fengyu Li, Jure Leskovec View a PDF of the paper titled RelBench v2: A Large-Scale Benchmark and Repository for Relational Data, by Justin Gu and 10 other authors View PDF HTML (experimental) Abstract:Relational deep learning (RDL) has emerged as a powerful paradigm for learning directly on relational databases by modeling entities and their relationships across multiple interconnected tables. As this paradigm evolves toward larger models and relational foundation models, scalable and realistic benchmarks are essential for enabling systematic evaluation and progress. In this paper, we introduce RelBench v2, a major expansion of the RelBench benchmark for RDL. RelBench v2 adds four large-scale relational datasets spanning scholarly publications, enterprise resource planning, consumer platforms, and clinical records, increasing the benchmark to 11 datasets comprising over 22 million rows across 29 tables. We further introduce autocomplete tasks, a new class of predictive objectives that require models to infer missing attribute values directly within relational tables while respecting temporal constraints, expanding beyond traditional forecasting tasks constructe...