[2602.14587] Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow
Summary
This paper presents a novel decoupled continuous-time reinforcement learning algorithm using Hamiltonian flow, addressing challenges in standard discrete-time methods and demonstrating superior performance in empirical tests.
Why It Matters
The research tackles significant limitations in reinforcement learning for continuous-time environments, which are prevalent in real-world applications like finance and robotics. By introducing a decoupled actor-critic approach, the findings could lead to more efficient and reliable learning algorithms in complex control tasks.
Key Takeaways
- Introduces a decoupled actor-critic algorithm for continuous-time RL.
- Proves convergence through new probabilistic arguments.
- Outperforms existing continuous-time and leading discrete-time methods.
- Achieves significant profit improvements in real-world trading tasks.
- Addresses the complexities of training in continuous-time environments.
Computer Science > Machine Learning arXiv:2602.14587 (cs) [Submitted on 16 Feb 2026] Title:Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow Authors:Minh Nguyen View a PDF of the paper titled Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow, by Minh Nguyen View PDF HTML (experimental) Abstract:Many real-world control problems, ranging from finance to robotics, evolve in continuous time with non-uniform, event-driven decisions. Standard discrete-time reinforcement learning (RL), based on fixed-step Bellman updates, struggles in this setting: as time gaps shrink, the $Q$-function collapses to the value function $V$, eliminating action ranking. Existing continuous-time methods reintroduce action information via an advantage-rate function $q$. However, they enforce optimality through complicated martingale losses or orthogonality constraints, which are sensitive to the choice of test processes. These approaches entangle $V$ and $q$ into a large, complex optimization problem that is difficult to train reliably. To address these limitations, we propose a novel decoupled continuous-time actor-critic algorithm with alternating updates: $q$ is learned from diffusion generators on $V$, and $V$ is updated via a Hamiltonian-based value flow that remains informative under infinitesimal time steps, where standard max/softmax backups fail. Theoretically, we prove rigorous convergence via new probabilistic arguments, sidestepping the challenge that...