[2512.18957] Online Robust Reinforcement Learning with General Function Approximation
Summary
This paper presents an online robust reinforcement learning (DR-RL) algorithm that utilizes general function approximation, enabling robust policy learning through direct interaction without pre-collected data.
Why It Matters
The research addresses a critical challenge in reinforcement learning where performance deteriorates due to discrepancies between training and deployment environments. By proposing a method that operates without strong data assumptions, it enhances the applicability of DR-RL in real-world scenarios, making it a significant contribution to the field of machine learning.
Key Takeaways
- Introduces a fully online DR-RL algorithm that learns robust policies through interaction.
- Eliminates the need for prior knowledge or pre-collected datasets.
- Establishes regret guarantees characterized by the robust Bellman-Eluder dimension.
- Demonstrates sublinear regret bounds that do not scale with state or action space sizes.
- Shows practical scalability in structured problem classes.
Computer Science > Machine Learning arXiv:2512.18957 (cs) [Submitted on 22 Dec 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Online Robust Reinforcement Learning with General Function Approximation Authors:Debamita Ghosh, George K. Atia, Yue Wang View a PDF of the paper titled Online Robust Reinforcement Learning with General Function Approximation, by Debamita Ghosh and 2 other authors View PDF Abstract:In many real-world settings, reinforcement learning systems suffer performance degradation when the environment encountered at deployment differs from that observed during training. Distributionally robust reinforcement learning (DR-RL) mitigates this issue by seeking policies that maximize performance under the most adverse transition dynamics within a prescribed uncertainty set. Most existing DR-RL approaches, however, rely on strong data availability assumptions, such as access to a generative model or large offline datasets, and are largely restricted to tabular settings. In this work, we propose a fully online DR-RL algorithm with general function approximation that learns robust policies solely through interaction, without requiring prior knowledge or pre-collected data. Our approach is based on a dual-driven fitted robust Bellman procedure that simultaneously estimates the value function and the corresponding worst-case backup operator. We establish regret guarantees for online DR-RL characterized by an intrinsic complexity notion, the robust Bellman...