[2602.18495] RDBLearn: Simple In-Context Prediction Over Relational Databases
Summary
RDBLearn introduces a novel approach for in-context learning (ICL) in relational databases, enabling efficient prediction tasks without extensive retraining, outperforming traditional methods.
Why It Matters
As relational databases are prevalent in various industries, RDBLearn's methodology addresses a significant gap in machine learning by allowing models to leverage relational data effectively. This can streamline processes and enhance predictive accuracy across diverse applications.
Key Takeaways
- RDBLearn extends tabular in-context learning to relational databases.
- It utilizes relational aggregations to enhance predictive signals.
- The toolkit offers a user-friendly interface for model integration.
- RDBLearn outperforms strong supervised baselines in various datasets.
- This approach simplifies the prediction process in complex data environments.
Computer Science > Databases arXiv:2602.18495 (cs) [Submitted on 14 Feb 2026] Title:RDBLearn: Simple In-Context Prediction Over Relational Databases Authors:Yanlin Zhang, Linjie Xu, Quan Gan, David Wipf, Minjie Wang View a PDF of the paper titled RDBLearn: Simple In-Context Prediction Over Relational Databases, by Yanlin Zhang and 4 other authors View PDF Abstract:Recent advances in tabular in-context learning (ICL) show that a single pretrained model can adapt to new prediction tasks from a small set of labeled examples, avoiding per-task training and heavy tuning. However, many real-world tasks live in relational databases, where predictive signal is spread across multiple linked tables rather than a single flat table. We show that tabular ICL can be extended to relational prediction with a simple recipe: automatically featurize each target row using relational aggregations over its linked records, materialize the resulting augmented table, and run an off-the-shelf tabular foundation model on it. We package this approach in \textit{RDBLearn} (this https URL), an easy-to-use toolkit with a scikit-learn-style estimator interface that makes it straightforward to swap different tabular ICL backends; a complementary agent-specific interface is provided as well. Across a broad collection of RelBench and 4DBInfer datasets, RDBLearn is the best-performing foundation model approach we evaluate, at times even outperforming strong supervised baselines trained or fine-tuned on each ...