[2602.12829] FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
Summary
The paper presents FLAC, a novel framework for Maximum Entropy Reinforcement Learning that utilizes kinetic energy regularization to optimize policy without requiring explicit action densities.
Why It Matters
This research addresses a significant challenge in reinforcement learning by offering a new method that enhances policy optimization while maintaining high expressivity. The implications for continuous control tasks are substantial, potentially improving performance in various applications within AI and robotics.
Key Takeaways
- FLAC introduces a likelihood-free framework for policy optimization.
- The approach regulates policy stochasticity using kinetic energy as a proxy.
- It formulates policy optimization as a Generalized Schrödinger Bridge problem.
- Empirical results show FLAC's superior performance on high-dimensional benchmarks.
- The method avoids explicit density estimation, simplifying the reinforcement learning process.
Computer Science > Machine Learning arXiv:2602.12829 (cs) [Submitted on 13 Feb 2026] Title:FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching Authors:Lei Lv, Yunfei Li, Yu Luo, Fuchun Sun, Xiao Ma View a PDF of the paper titled FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching, by Lei Lv and 4 other authors View PDF Abstract:Iterative generative policies, such as diffusion models and flow matching, offer superior expressivity for continuous control but complicate Maximum Entropy Reinforcement Learning because their action log-densities are not directly accessible. To address this, we propose Field Least-Energy Actor-Critic (FLAC), a likelihood-free framework that regulates policy stochasticity by penalizing the kinetic energy of the velocity field. Our key insight is to formulate policy optimization as a Generalized Schrödinger Bridge (GSB) problem relative to a high-entropy reference process (e.g., uniform). Under this view, the maximum-entropy principle emerges naturally as staying close to a high-entropy reference while optimizing return, without requiring explicit action densities. In this framework, kinetic energy serves as a physically grounded proxy for divergence from the reference: minimizing path-space energy bounds the deviation of the induced terminal action distribution. Building on this view, we derive an energy-regularized policy iteration scheme and a practical off-policy algorithm that automatically tunes the kine...