[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning

[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning

arXiv - Machine Learning 4 min read Article

Summary

This article presents a novel approach to safe and near-optimal control in dynamic environments, utilizing online dynamics learning to ensure safety while maximizing performance.

Why It Matters

The research addresses critical challenges in deploying autonomous agents in real-world scenarios where safety and optimality are paramount. By introducing a framework that balances exploration and safety, it contributes to advancements in robotics and control systems, potentially impacting industries like autonomous driving and drone navigation.

Key Takeaways

  • Introduces maximum safe dynamics learning for optimal control.
  • Ensures safety during online learning without requiring resets.
  • Demonstrates effectiveness in complex scenarios like autonomous car racing.
  • Operates in a non-episodic setting, differing from traditional reinforcement learning.
  • Achieves close-to-optimal performance with minimal necessary dynamics learning.

Electrical Engineering and Systems Science > Systems and Control arXiv:2509.16650 (eess) [Submitted on 20 Sep 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:Safe and Near-Optimal Control with Online Dynamics Learning Authors:Manish Prajapat, Johannes Köhler, Melanie N. Zeilinger, Andreas Krause View a PDF of the paper titled Safe and Near-Optimal Control with Online Dynamics Learning, by Manish Prajapat and 3 other authors View PDF Abstract:Achieving both optimality and safety under unknown system dynamics is a central challenge in real-world deployment of agents. To address this, we introduce a notion of maximum safe dynamics learning, where sufficient exploration is performed within the space of safe policies. Our method executes $\textit{pessimistically}$ safe policies while $\textit{optimistically}$ exploring informative states and, despite not reaching them due to model uncertainty, ensures continuous online learning of dynamics. The framework achieves first-of-its-kind results: learning the dynamics model sufficiently $-$ up to an arbitrary small tolerance (subject to noise) $-$ in a finite time, while ensuring provably safe operation throughout with high probability and without requiring resets. Building on this, we propose an algorithm to maximize rewards while learning the dynamics $\textit{only to the extent needed}$ to achieve close-to-optimal performance. Unlike typical reinforcement learning (RL) methods, our approach operates online in a non-ep...

Related Articles

Ai Infrastructure

[P] fastrad: GPU-native radiomics library — 25× faster than PyRadiomics, 100% IBSI-compliant, all 8 feature classes

PyRadiomics is the de facto standard for radiomic feature extraction, but it's CPU-only and takes ~3 seconds per scan. At scale, that's a...

Reddit - Machine Learning · 1 min ·
Llms

World models will be the next big thing, bye-bye LLMs

Was at Nvidia's GTC conference recently and honestly, it was one of the most eye-opening events I've attended in a while. There was a lot...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

I'm already blasting cursor, but I want to level up my output. I heard that these kind of AI tools and workflows are being asked in SF. W...

Reddit - Machine Learning · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime