Machine Learning Ai Safety Ai Agents

[2602.17312] LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy

arXiv - Machine Learning February 20, 2026 3 min read Article

Summary

The paper presents LexiSafe, a novel offline safe reinforcement learning framework that employs a lexicographic safety-reward hierarchy to enhance safety in cyber-physical systems.

Why It Matters

As cyber-physical systems become more prevalent, ensuring safety during reinforcement learning is critical. LexiSafe addresses the limitations of existing methods by providing a structured approach to balance safety and performance, making it highly relevant for researchers and practitioners in AI safety and machine learning.

Key Takeaways

LexiSafe introduces a lexicographic framework for offline safe reinforcement learning.
The framework includes both single-cost and multi-cost formulations to handle varying safety requirements.
Empirical results show LexiSafe reduces safety violations while improving task performance compared to existing methods.

Computer Science > Machine Learning arXiv:2602.17312 (cs) [Submitted on 19 Feb 2026] Title:LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy Authors:Hsin-Jung Yang, Zhanhong Jiang, Prajwal Koirala, Qisai Liu, Cody Fleming, Soumik Sarkar View a PDF of the paper titled LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy, by Hsin-Jung Yang and 5 other authors View PDF HTML (experimental) Abstract:Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe demonstrates reduced safety violations and improved task performance compared to constrained offline baselines. By unifying lexicograp...

Read Original Article

Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · about 2 hours ago