[2604.02349] OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
About this article
Abstract page for arXiv paper 2604.02349: OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
Computer Science > Machine Learning arXiv:2604.02349 (cs) [Submitted on 19 Feb 2026] Title:OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration Authors:Yiqin Yang, Hao Hu, Yihuan Mao, Jin Zhang, Chengjie Wu, Yuhua Jiang, Xu Yang, Runpeng Xie, Yi Fan, Bo Liu, Yang Gao, Bo Xu, Chongjie Zhang View a PDF of the paper titled OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration, by Yiqin Yang and 12 other authors View PDF HTML (experimental) Abstract:Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in various real-world applications. However, obtaining human feedback for preferences can be expensive and time-consuming, which forms a strong barrier for PbRL. In this work, we address the problem of low query efficiency in offline PbRL, pinpointing two primary reasons: inefficient exploration and overoptimization of learned reward functions. In response to these challenges, we propose a novel algorithm, \textbf{O}ffline \textbf{P}b\textbf{R}L via \textbf{I}n-\textbf{D}ataset \textbf{E}xploration (OPRIDE), designed to enhance the query efficiency of offline PbRL. OPRIDE consists of two key features: a principled exploration strategy that maximizes the informativeness of the queries and a discount scheduling mechanism aimed at mitigating overoptimization of the learned reward functions. Through empirical evaluations, we de...