[2602.17312] LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy
Summary
The paper presents LexiSafe, a novel offline safe reinforcement learning framework that employs a lexicographic safety-reward hierarchy to enhance safety in cyber-physical systems.
Why It Matters
As cyber-physical systems become more prevalent, ensuring safety during reinforcement learning is critical. LexiSafe addresses the limitations of existing methods by providing a structured approach to balance safety and performance, making it highly relevant for researchers and practitioners in AI safety and machine learning.
Key Takeaways
- LexiSafe introduces a lexicographic framework for offline safe reinforcement learning.
- The framework includes both single-cost and multi-cost formulations to handle varying safety requirements.
- Empirical results show LexiSafe reduces safety violations while improving task performance compared to existing methods.
Computer Science > Machine Learning arXiv:2602.17312 (cs) [Submitted on 19 Feb 2026] Title:LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy Authors:Hsin-Jung Yang, Zhanhong Jiang, Prajwal Koirala, Qisai Liu, Cody Fleming, Soumik Sarkar View a PDF of the paper titled LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy, by Hsin-Jung Yang and 5 other authors View PDF HTML (experimental) Abstract:Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe demonstrates reduced safety violations and improved task performance compared to constrained offline baselines. By unifying lexicograp...