[2603.16065] Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models
About this article
Abstract page for arXiv paper 2603.16065: Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models
Computer Science > Robotics arXiv:2603.16065 (cs) [Submitted on 17 Mar 2026 (v1), last revised 22 Mar 2026 (this version, v2)] Title:Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models Authors:Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao, Yue Wang View a PDF of the paper titled Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models, by Yanru Wu and 5 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward functions. In this paper, we propose a framework for online policy refinement by adapting foundation VLMs into online reward generators. We develop a robust, scalable reward model based on a state-of-the-art VLM, trained on a large-scale, multi-source dataset encompassing real-world robot trajectories, human-object interactions, and diverse simulated environments. Unlike prior approaches that evaluate entire trajectories post-hoc, our method leverages the VLM to formulate a multifaceted reward signal comprising process, completion, and temporal contrastive rewards based on current visual observations. Initializing with a base policy trained via Imitation Learning (IL), we employ these VLM rewards to guide the model to correct sub-optimal behaviors in a closed-loop manner. We evaluate our f...