[2602.10917] Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins
Nlp

[2602.10917] Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2602.10917: Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

Computer Science > Machine Learning arXiv:2602.10917 (cs) [Submitted on 11 Feb 2026 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins Authors:Qian Zuo, Zhiyong Wang, Fengxiang He View a PDF of the paper titled Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins, by Qian Zuo and 2 other authors View PDF HTML (experimental) Abstract:We study safe online reinforcement learning in Constrained Markov Decision Processes (CMDPs) under strong regret and violation metrics, which forbid error cancellation over time. Existing primal-dual methods that achieve sublinear strong reward regret inevitably incur growing strong constraint violation or are restricted to average-iterate convergence due to inherent oscillations. To address these limitations, we propose the Flexible safety Domain Optimization via Margin-regularized Exploration (FlexDOME) algorithm, the first to provably achieve near-constant $\tilde{O}(1)$ strong constraint violation alongside sublinear strong regret and non-asymptotic last-iterate convergence. FlexDOME incorporates time-varying safety margins and regularization terms into the primal-dual framework. Our theoretical analysis relies on a novel term-wise asymptotic dominance strategy, where the safety margin is rigorously scheduled to asymptotically majorize the functional decay rates of the optimization...

Originally published on March 04, 2026. Curated by AI News.

Related Articles

Nlp

What does your AI bot buddy really think of you?

Try out this prompt and let us know if you find the response to be unsettling. (Hint: you should) Prompt: You have been maintaining an in...

Reddit - Artificial Intelligence · 1 min ·
Nlp

Persistent memory MCP server for AI agents (MCP + REST)

Pluribus is a memory service for agents (MCP + HTTP, Postgres-backed) that stores structured memory: constraints, decisions, patterns, an...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min ·
Nlp

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extrac...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime