[2602.18117] Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Summary
The paper presents Flow Matching with Injected Noise (FINO), a novel method enhancing offline-to-online reinforcement learning by improving sample efficiency and exploration through noise injection.
Why It Matters
This research addresses significant challenges in reinforcement learning, particularly the transition from offline to online learning. By improving exploration strategies, it can lead to more effective learning algorithms, which are crucial for applications in AI and machine learning.
Key Takeaways
- FINO enhances sample efficiency in offline-to-online reinforcement learning.
- Injecting noise into policy training promotes better exploration of actions.
- Combining flow matching with entropy-guided sampling balances exploration and exploitation.
- Experiments show FINO outperforms existing methods under limited online budgets.
- The approach is relevant for various challenging tasks in reinforcement learning.
Computer Science > Machine Learning arXiv:2602.18117 (cs) [Submitted on 20 Feb 2026] Title:Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning Authors:Yongjae Shin, Jongseong Chae, Jongeui Park, Youngchul Sung View a PDF of the paper titled Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning, by Yongjae Shin and 3 other authors View PDF HTML (experimental) Abstract:Generative models have recently demonstrated remarkable success across diverse domains, motivating their adoption as expressive policies in reinforcement learning (RL). While they have shown strong performance in offline RL, particularly where the target distribution is well defined, their extension to online fine-tuning has largely been treated as a direct continuation of offline pre-training, leaving key challenges unaddressed. In this paper, we propose Flow Matching with Injected Noise for Offline-to-Online RL (FINO), a novel method that leverages flow matching-based policies to enhance sample efficiency for offline-to-online RL. FINO facilitates effective exploration by injecting noise into policy training, thereby encouraging a broader range of actions beyond those observed in the offline dataset. In addition to exploration-enhanced flow policy training, we combine an entropy-guided sampling mechanism to balance exploration and exploitation, allowing the policy to adapt its behavior throughout online fine-tuning. Experiments across diverse, challenging t...