[2602.08032] Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models
Summary
The paper presents Horizon Imagination (HI), an innovative on-policy imagination process for reinforcement learning using diffusion-based world models, enhancing efficiency in control tasks while maintaining performance.
Why It Matters
This research addresses critical efficiency challenges in reinforcement learning by proposing a novel approach that reduces computational costs associated with traditional methods. By improving the generative fidelity of world models, it has implications for various applications in AI and robotics, making it relevant for researchers and practitioners in these fields.
Key Takeaways
- Horizon Imagination (HI) improves efficiency in reinforcement learning tasks.
- The method allows parallel denoising of future observations, reducing computational costs.
- HI maintains control performance with fewer denoising steps, enhancing generation quality.
Computer Science > Machine Learning arXiv:2602.08032 (cs) [Submitted on 8 Feb 2026 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models Authors:Lior Cohen, Ofir Nabati, Kaixin Wang, Navdeep Kumar, Shie Mannor View a PDF of the paper titled Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models, by Lior Cohen and 4 other authors View PDF HTML (experimental) Abstract:We study diffusion-based world models for reinforcement learning, which offer high generative fidelity but face critical efficiency challenges in control. Current methods either require heavyweight models at inference or rely on highly sequential imagination, both of which impose prohibitive computational costs. We propose Horizon Imagination (HI), an on-policy imagination process for discrete stochastic policies that denoises multiple future observations in parallel. HI incorporates a stabilization mechanism and a novel sampling schedule that decouples the denoising budget from the effective horizon over which denoising is applied while also supporting sub-frame budgets. Experiments on Atari 100K and Craftium show that our approach maintains control performance with a sub-frame budget of half the denoising steps and achieves superior generation quality under varied schedules. Code is available at this https URL. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.08032 [cs.LG] (or arXiv:2602.08032v2...