[2509.19464] Evaluation-Aware Reinforcement Learning
About this article
Abstract page for arXiv paper 2509.19464: Evaluation-Aware Reinforcement Learning
Computer Science > Artificial Intelligence arXiv:2509.19464 (cs) [Submitted on 23 Sep 2025 (v1), last revised 20 Mar 2026 (this version, v3)] Title:Evaluation-Aware Reinforcement Learning Authors:Shripad Vilasrao Deshmukh, Will Schwarzer, Scott Niekum View a PDF of the paper titled Evaluation-Aware Reinforcement Learning, by Shripad Vilasrao Deshmukh and 2 other authors View PDF HTML (experimental) Abstract:Policy evaluation is a core component of many reinforcement learning (RL) algorithms and a critical tool for ensuring safe deployment of RL policies. However, existing policy evaluation methods often suffer from high variance or bias. To address these issues, we introduce Evaluation-Aware Reinforcement Learning (EvA-RL), a general policy learning framework that considers evaluation accuracy at train-time, as opposed to standard post-hoc policy evaluation methods. Specifically, EvA-RL directly optimizes policies for efficient and accurate evaluation, in addition to being performant. We provide an instantiation of EvA-RL and demonstrate through a combination of theoretical analysis and empirical results that EvA-RL effectively trades off between evaluation accuracy and expected return. Finally, we show that the evaluation-aware policy and the evaluation mechanism itself can be co-learned to mitigate this tradeoff, providing the evaluation benefits without significantly sacrificing policy performance. This work opens a new line of research that elevates reliable evaluation...