[2602.10917] Near-Constant Strong Violation and Last-Iterate

[2602.10917] Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

arXiv - Machine Learning March 04, 2026 3 min read

About this article

Abstract page for arXiv paper 2602.10917: Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

Computer Science > Machine Learning arXiv:2602.10917 (cs) [Submitted on 11 Feb 2026 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins Authors:Qian Zuo, Zhiyong Wang, Fengxiang He View a PDF of the paper titled Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins, by Qian Zuo and 2 other authors View PDF HTML (experimental) Abstract:We study safe online reinforcement learning in Constrained Markov Decision Processes (CMDPs) under strong regret and violation metrics, which forbid error cancellation over time. Existing primal-dual methods that achieve sublinear strong reward regret inevitably incur growing strong constraint violation or are restricted to average-iterate convergence due to inherent oscillations. To address these limitations, we propose the Flexible safety Domain Optimization via Margin-regularized Exploration (FlexDOME) algorithm, the first to provably achieve near-constant $\tilde{O}(1)$ strong constraint violation alongside sublinear strong regret and non-asymptotic last-iterate convergence. FlexDOME incorporates time-varying safety margins and regularization terms into the primal-dual framework. Our theoretical analysis relies on a novel term-wise asymptotic dominance strategy, where the safety margin is rigorously scheduled to asymptotically majorize the functional decay rates of the optimization...

Originally published on March 04, 2026. Curated by AI News.

Nlp

What does your AI bot buddy really think of you?

Try out this prompt and let us know if you find the response to be unsettling. (Hint: you should) Prompt: You have been maintaining an in...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Nlp

Persistent memory MCP server for AI agents (MCP + REST)

Pluribus is a memory service for agents (MCP + HTTP, Postgres-backed) that stores structured memory: constraints, decisions, patterns, an...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min · about 11 hours ago

Nlp

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extrac...

Reddit - Machine Learning · 1 min · about 14 hours ago

[2602.10917] Near-Constant Strong Violation and Last-Iterate Convergence for Online CMDPs via Decaying Safety Margins

About this article

Related Articles

What does your AI bot buddy really think of you?

Persistent memory MCP server for AI agents (MCP + REST)

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

No comments

Stay updated with AI News