[2602.12492] Composable Model-Free RL for Navigation with Input-Affine Systems

[2602.12492] Composable Model-Free RL for Navigation with Input-Affine Systems

arXiv - Machine Learning 3 min read Article

Summary

This paper presents a novel composable model-free reinforcement learning approach for navigation in dynamic environments, focusing on real-time obstacle avoidance and goal-reaching strategies.

Why It Matters

As autonomous robots increasingly operate in complex real-world scenarios, effective navigation strategies are crucial. This research offers a significant advancement in model-free reinforcement learning, providing a framework that enhances safety and efficiency in robotic navigation, which is vital for applications in various fields including robotics and AI.

Key Takeaways

  • Introduces a composable model-free RL method for navigation.
  • Derives a continuous-time Hamilton-Jacobi-Bellman equation for value functions.
  • Demonstrates improved performance over traditional methods like PPO.
  • Provides formal guarantees for obstacle avoidance using a QCQP framework.
  • Focuses on real-time learning for dynamic environments.

Computer Science > Robotics arXiv:2602.12492 (cs) [Submitted on 13 Feb 2026] Title:Composable Model-Free RL for Navigation with Input-Affine Systems Authors:Xinhuan Sang, Abdelrahman Abdelgawad, Roberto Tron View a PDF of the paper titled Composable Model-Free RL for Navigation with Input-Affine Systems, by Xinhuan Sang and 2 other authors View PDF HTML (experimental) Abstract:As autonomous robots move into complex, dynamic real-world environments, they must learn to navigate safely in real time, yet anticipating all possible behaviors is infeasible. We propose a composable, model-free reinforcement learning method that learns a value function and an optimal policy for each individual environment element (e.g., goal or obstacle) and composes them online to achieve goal reaching and collision avoidance. Assuming unknown nonlinear dynamics that evolve in continuous time and are input-affine, we derive a continuous-time Hamilton-Jacobi-Bellman (HJB) equation for the value function and show that the corresponding advantage function is quadratic in the action and optimal policy. Based on this structure, we introduce a model-free actor-critic algorithm that learns policies and value functions for static or moving obstacles using gradient descent. We then compose multiple reach/avoid models via a quadratically constrained quadratic program (QCQP), yielding formal obstacle-avoidance guarantees in terms of value-function level sets, providing a model-free alternative to CLF/CBF-bas...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime