[2603.12567] Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery
About this article
Abstract page for arXiv paper 2603.12567: Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery
Condensed Matter > Materials Science arXiv:2603.12567 (cond-mat) [Submitted on 13 Mar 2026 (v1), last revised 24 Mar 2026 (this version, v3)] Title:Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery Authors:Jeffrey Hu, Rongzhi Dong, Ying Feng, Ming Hu, Jianjun Hu View a PDF of the paper titled Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery, by Jeffrey Hu and 4 other authors View PDF HTML (experimental) Abstract:Active learning (AL) has emerged as a powerful paradigm for accelerating materials discovery by iteratively steering experiments toward promising candidates, reducing the number of costly synthesis-and-characterization cycles needed to identify optimal materials. However, current AL relies predominantly on Gaussian Process (GP) and Random Forest (RF) surrogates, which suffer from complementary limitations: GP underfits complex composition-property landscapes due to rigid kernel assumptions, while RF produces unreliable heuristic uncertainty estimates in small-data regimes. This small-data challenge is pervasive in materials science, making reliable surrogate modeling extremely difficult with models trained from scratch on each new dataset. Here we propose In-Context Active Learning (ICAL), which addresses this bottleneck by replacing conventional surrogates with TabPFN, a transformer-based foundation model (FM) pre-trained on millions of synthetic regression tasks to meta-learn a unive...