[2512.18957] Online Robust Reinforcement Learning with General Function Approximation

[2512.18957] Online Robust Reinforcement Learning with General Function Approximation

arXiv - Machine Learning 4 min read Article

Summary

This paper presents an online robust reinforcement learning (DR-RL) algorithm that utilizes general function approximation, enabling robust policy learning through direct interaction without pre-collected data.

Why It Matters

The research addresses a critical challenge in reinforcement learning where performance deteriorates due to discrepancies between training and deployment environments. By proposing a method that operates without strong data assumptions, it enhances the applicability of DR-RL in real-world scenarios, making it a significant contribution to the field of machine learning.

Key Takeaways

  • Introduces a fully online DR-RL algorithm that learns robust policies through interaction.
  • Eliminates the need for prior knowledge or pre-collected datasets.
  • Establishes regret guarantees characterized by the robust Bellman-Eluder dimension.
  • Demonstrates sublinear regret bounds that do not scale with state or action space sizes.
  • Shows practical scalability in structured problem classes.

Computer Science > Machine Learning arXiv:2512.18957 (cs) [Submitted on 22 Dec 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Online Robust Reinforcement Learning with General Function Approximation Authors:Debamita Ghosh, George K. Atia, Yue Wang View a PDF of the paper titled Online Robust Reinforcement Learning with General Function Approximation, by Debamita Ghosh and 2 other authors View PDF Abstract:In many real-world settings, reinforcement learning systems suffer performance degradation when the environment encountered at deployment differs from that observed during training. Distributionally robust reinforcement learning (DR-RL) mitigates this issue by seeking policies that maximize performance under the most adverse transition dynamics within a prescribed uncertainty set. Most existing DR-RL approaches, however, rely on strong data availability assumptions, such as access to a generative model or large offline datasets, and are largely restricted to tabular settings. In this work, we propose a fully online DR-RL algorithm with general function approximation that learns robust policies solely through interaction, without requiring prior knowledge or pre-collected data. Our approach is based on a dual-driven fitted robust Bellman procedure that simultaneously estimates the value function and the corresponding worst-case backup operator. We establish regret guarantees for online DR-RL characterized by an intrinsic complexity notion, the robust Bellman...

Related Articles

Llms

[D] How's MLX and jax/ pytorch on MacBooks these days?

​ So I'm looking at buying a new 14 inch MacBook pro with m5 pro and 64 gb of memory vs m4 max with same specs. My priorities are pro sof...

Reddit - Machine Learning · 1 min ·
Llms

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

BANKING77 (77 fine-grained banking intents) is a well-established but increasingly saturated intent classification benchmark. did this wh...

Reddit - Machine Learning · 1 min ·
As Meta Flounders, It Reportedly Plans to Open Source Its New AI Models
Machine Learning

As Meta Flounders, It Reportedly Plans to Open Source Its New AI Models

At least if it sucks, everyone will be able to see why.

AI Tools & Products · 5 min ·
Google quietly launched an AI dictation app that works offline
Machine Learning

Google quietly launched an AI dictation app that works offline

Google's new offline-first dictation app uses Gemma AI models to take on the apps like Wispr Flow.

TechCrunch - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime