[2512.03973] Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

[2512.03973] Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2512.03973: Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

Computer Science > Machine Learning arXiv:2512.03973 (cs) [Submitted on 3 Dec 2025 (v1), last revised 5 Mar 2026 (this version, v2)] Title:Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning Authors:Franki Nguimatsia Tiofack, Théotime Le Hellard, Fabian Schramm, Nicolas Perrin-Gilbert, Justin Carpentier View a PDF of the paper titled Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning, by Franki Nguimatsia Tiofack and 4 other authors View PDF HTML (experimental) Abstract:Offline reinforcement learning often relies on behavior regularization that enforces policies to remain close to the dataset distribution. However, such approaches fail to distinguish between high-value and low-value actions in their regularization components. We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled one-step actor. The actor directs the flow policy through weighted behavior cloning to focus on cloning high-value actions from the dataset rather than indiscriminately imitating all state-action pairs. In turn, the flow policy constrains the actor to remain aligned with the dataset's best transitions while maximizing the critic. This mutual guidance enables GFP to achieve state-of-the-art performance across 144 state and pixel-based tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and challenging tasks. Webpage: this https URL Su...

Originally published on March 06, 2026. Curated by AI News.

Related Articles

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

Many times when I try to deeply understand a topic in machine learning — whether it's a new architecture, a quantization method, a full t...

Reddit - Machine Learning · 1 min ·
More in Data Science: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime