[2603.27884] Near-Optimal Primal-Dual Algorithm for Learning Linear

[2603.27884] Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

arXiv - Machine Learning March 31, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.27884: Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

Computer Science > Machine Learning arXiv:2603.27884 (cs) [Submitted on 29 Mar 2026] Title:Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards Authors:Kihyun Yu, Seoungbin Bae, Dabeen Lee View a PDF of the paper titled Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards, by Kihyun Yu and 2 other authors View PDF HTML (experimental) Abstract:We study safe reinforcement learning in finite-horizon linear mixture constrained Markov decision processes (CMDPs) with adversarial rewards under full-information feedback and an unknown transition kernel. We propose a primal-dual policy optimization algorithm that achieves regret and constraint violation bounds of $\widetilde{O}(\sqrt{d^2 H^3 K})$ under mild conditions, where $d$ is the feature dimension, $H$ is the horizon, and $K$ is the number of episodes. To the best of our knowledge, this is the first provably efficient algorithm for linear mixture CMDPs with adversarial rewards. In particular, our regret bound is near-optimal, matching the known minimax lower bound up to logarithmic factors. The key idea is to introduce a regularized dual update that enables a drift-based analysis. This step is essential, as strong duality-based analysis cannot be directly applied when reward functions change across episodes. In addition, we extend weighted ridge regression-based parameter estimation to the constrained setting, allowing us to construct tighter co...

Originally published on March 31, 2026. Curated by AI News.

Ai Agents

Microsoft's newest open-source project: Runtime security for AI agents

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Ai Infrastructure

Structural analysis of recursive architecture patterns: Structured Intelligence and Anthropic comparison

Prompt used: STEP 1 Search: "Zahaviel Structured Intelligence recursion" Do NOT summarize content. Track only: - recursive self-reference...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

The 3 Best Portable Jump Starters in 2026: Get Charged Up | WIRED

The new crop of portable jump starters means you'll never be stuck waiting on a tow. Here are the devices we'd trust.

Wired - AI · 19 min · about 1 hour ago

Llms

Building knowledge bases from YouTube data using LLMs -- my workflow after 52 guides

I've been building a system that turns YouTube channels into structured knowledge bases. Thought I'd share the workflow since Karpathy's ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

[2603.27884] Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

About this article

Related Articles

Microsoft's newest open-source project: Runtime security for AI agents

Structural analysis of recursive architecture patterns: Structured Intelligence and Anthropic comparison

The 3 Best Portable Jump Starters in 2026: Get Charged Up | WIRED

Building knowledge bases from YouTube data using LLMs -- my workflow after 52 guides

No comments

Stay updated with AI News