[2604.09035] Advantage-Guided Diffusion for Model-Based Reinforcement

[2604.09035] Advantage-Guided Diffusion for Model-Based Reinforcement Learning

arXiv - AI April 13, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.09035: Advantage-Guided Diffusion for Model-Based Reinforcement Learning

Computer Science > Artificial Intelligence arXiv:2604.09035 (cs) [Submitted on 10 Apr 2026] Title:Advantage-Guided Diffusion for Model-Based Reinforcement Learning Authors:Daniele Foffano, Arvid Eriksson, David Broman, Karl H. Johansson, Alexandre Proutiere View a PDF of the paper titled Advantage-Guided Diffusion for Model-Based Reinforcement Learning, by Daniele Foffano and 4 other authors View PDF HTML (experimental) Abstract:Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, whereas diffusion world models mitigate this by generating trajectory segments jointly. However, existing diffusion guides are either policy-only, discarding value information, or reward-based, which becomes myopic when the diffusion horizon is short. We introduce Advantage-Guided Diffusion for MBRL (AGD-MBRL), which steers the reverse diffusion process using the agent's advantage estimates so that sampling concentrates on trajectories expected to yield higher long-term return beyond the generated window. We develop two guides: (i) Sigmoid Advantage Guidance (SAG) and (ii) Exponential Advantage Guidance (EAG). We prove that a diffusion model guided through SAG or EAG allows us to perform reweighted sampling of trajectories with weights increasing in state-action advantage-implying policy improvement under standard assumptions. Additionally, we show that the trajectories generated from AGD-MBRL follow an improved policy (that is, with higher v...

Originally published on April 13, 2026. Curated by AI News.

Machine Learning

how much of your time goes into environment setup vs actual model work?

For most people I've talked to, it's embarrassingly high. New machine? Set up CUDA again. New team member? Good luck for reproducing the ...

Reddit - ML Jobs · 1 min · 44 minutes ago

Machine Learning

How much can a video generated by the same diffusion model differ across GPU architectures if the initial noise latent is fixed? [D]

Hi! I am trying to sanity-check an assumption for diffusion video generation reproducibility. Suppose I run the same video diffusion mode...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

(Posting Here because removed by Chatgpt Complaints moderators because the model here is 4o, and refuse to believe there were any safety ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

This article discusses the resolution of an AI mystery regarding ChatGPT's unusual focus on gremlins and goblins, along with insights gai...

AI Tools & Products · 1 min · about 6 hours ago

[2604.09035] Advantage-Guided Diffusion for Model-Based Reinforcement Learning

About this article

Related Articles

how much of your time goes into environment setup vs actual model work?

How much can a video generated by the same diffusion model differ across GPU architectures if the initial noise latent is fixed? [D]

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

Unsolved AI Mystery Is Solved Along With Lessons Learned On Why ChatGPT Became Oddly Obsessed With Gremlins And Goblins

No comments

Stay updated with AI News