[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning
Summary
This article presents a novel approach to safe and near-optimal control in dynamic environments, utilizing online dynamics learning to ensure safety while maximizing performance.
Why It Matters
The research addresses critical challenges in deploying autonomous agents in real-world scenarios where safety and optimality are paramount. By introducing a framework that balances exploration and safety, it contributes to advancements in robotics and control systems, potentially impacting industries like autonomous driving and drone navigation.
Key Takeaways
- Introduces maximum safe dynamics learning for optimal control.
- Ensures safety during online learning without requiring resets.
- Demonstrates effectiveness in complex scenarios like autonomous car racing.
- Operates in a non-episodic setting, differing from traditional reinforcement learning.
- Achieves close-to-optimal performance with minimal necessary dynamics learning.
Electrical Engineering and Systems Science > Systems and Control arXiv:2509.16650 (eess) [Submitted on 20 Sep 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:Safe and Near-Optimal Control with Online Dynamics Learning Authors:Manish Prajapat, Johannes Köhler, Melanie N. Zeilinger, Andreas Krause View a PDF of the paper titled Safe and Near-Optimal Control with Online Dynamics Learning, by Manish Prajapat and 3 other authors View PDF Abstract:Achieving both optimality and safety under unknown system dynamics is a central challenge in real-world deployment of agents. To address this, we introduce a notion of maximum safe dynamics learning, where sufficient exploration is performed within the space of safe policies. Our method executes $\textit{pessimistically}$ safe policies while $\textit{optimistically}$ exploring informative states and, despite not reaching them due to model uncertainty, ensures continuous online learning of dynamics. The framework achieves first-of-its-kind results: learning the dynamics model sufficiently $-$ up to an arbitrary small tolerance (subject to noise) $-$ in a finite time, while ensuring provably safe operation throughout with high probability and without requiring resets. Building on this, we propose an algorithm to maximize rewards while learning the dynamics $\textit{only to the extent needed}$ to achieve close-to-optimal performance. Unlike typical reinforcement learning (RL) methods, our approach operates online in a non-ep...