[2602.19634] Compositional Planning with Jumpy World Models
Summary
This paper presents a novel approach to compositional planning using jumpy world models, enhancing long-horizon predictive accuracy and improving performance on complex tasks.
Why It Matters
The research addresses challenges in intelligent decision-making by enabling agents to compose pre-trained policies into temporally extended actions. This advancement is significant for fields such as robotics and AI, where effective planning is crucial for solving complex tasks that require long-term strategy.
Key Takeaways
- Introduces jumpy world models for improved planning accuracy.
- Enhances long-horizon predictive capabilities through a consistency objective.
- Demonstrates significant performance improvements in manipulation and navigation tasks.
- Achieves an average of 200% relative improvement over traditional planning methods.
- Offers insights into the geometric policy composition framework.
Computer Science > Machine Learning arXiv:2602.19634 (cs) [Submitted on 23 Feb 2026] Title:Compositional Planning with Jumpy World Models Authors:Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Marc G. Bellemare, Alessandro Lazaric, Ahmed Touati View a PDF of the paper titled Compositional Planning with Jumpy World Models, by Jesse Farebrother and 5 other authors View PDF HTML (experimental) Abstract:The ability to plan with temporal abstractions is central to intelligent decision-making. Rather than reasoning over primitive actions, we study agents that compose pre-trained policies as temporally extended actions, enabling solutions to complex tasks that no constituent alone can solve. Such compositional planning remains elusive as compounding errors in long-horizon predictions make it challenging to estimate the visitation distribution induced by sequencing policies. Motivated by the geometric policy composition framework introduced in arXiv:2206.08736, we address these challenges by learning predictive models of multi-step dynamics -- so-called jumpy world models -- that capture state occupancies induced by pre-trained policies across multiple timescales in an off-policy manner. Building on Temporal Difference Flows (arXiv:2503.09817), we enhance these models with a novel consistency objective that aligns predictions across timescales, improving long-horizon predictive accuracy. We further demonstrate how to combine these generative predictions to estimate the value...