[2510.20264] Optimistic Task Inference for Behavior Foundation Models
About this article
Abstract page for arXiv paper 2510.20264: Optimistic Task Inference for Behavior Foundation Models
Computer Science > Machine Learning arXiv:2510.20264 (cs) [Submitted on 23 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Optimistic Task Inference for Behavior Foundation Models Authors:Thomas Rupf, Marco Bagatella, Marin Vlastelica, Andreas Krause View a PDF of the paper titled Optimistic Task Inference for Behavior Foundation Models, by Thomas Rupf and 3 other authors View PDF HTML (experimental) Abstract:Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient process in terms of compute, it can be less so in terms of data: as a standard assumption, BFMs require computing rewards over a non-negligible inference dataset, assuming either access to a functional form of rewards, or significant labeling efforts. To alleviate these limitations, we tackle the problem of task inference purely through interaction with the environment at test-time. We propose OpTI-BFM, an optimistic decision criterion that directly models uncertainty over reward functions and guides BFMs in data collection for task inference. Formally, we provide a regret bound for well-trained BFMs through a direct connection to upper-confidence algorithms for linear bandits. Empirically, we evaluate OpTI-BFM on established zero-shot benchmarks, and observe that it enables successor-features-based BFMs to identify and o...