[2602.18694] In-Context Planning with Latent Temporal Abstractions
Summary
The paper presents I-TAP, a novel offline reinforcement learning framework that enhances planning in continuous control by utilizing latent temporal abstractions, addressing challenges in partially observable environments.
Why It Matters
This research is significant as it tackles critical limitations in reinforcement learning, particularly in real-world applications where environments are often unpredictable and partially observable. By introducing I-TAP, the authors provide a framework that improves planning efficiency and robustness, which could lead to advancements in robotics and AI applications.
Key Takeaways
- I-TAP integrates in-context adaptation with online planning using latent temporal abstractions.
- The framework effectively compresses observations into discrete tokens for improved processing.
- I-TAP demonstrates superior performance in various environments compared to existing offline baselines.
- The approach allows for Monte Carlo Tree Search directly in token space, enhancing adaptability.
- This research could influence future developments in reinforcement learning and AI applications.
Computer Science > Machine Learning arXiv:2602.18694 (cs) [Submitted on 21 Feb 2026] Title:In-Context Planning with Latent Temporal Abstractions Authors:Baiting Luo, Yunuo Zhang, Nathaniel S. Keplinger, Samir Gupta, Abhishek Dubey, Ayan Mukhopadhyay View a PDF of the paper titled In-Context Planning with Latent Temporal Abstractions, by Baiting Luo and 5 other authors View PDF HTML (experimental) Abstract:Planning-based reinforcement learning for continuous control is bottlenecked by two practical issues: planning at primitive time scales leads to prohibitive branching and long horizons, while real environments are frequently partially observable and exhibit regime shifts that invalidate stationary, fully observed dynamics assumptions. We introduce I-TAP (In-Context Latent Temporal-Abstraction Planner), an offline RL framework that unifies in-context adaptation with online planning in a learned discrete temporal-abstraction space. From offline trajectories, I-TAP learns an observation-conditioned residual-quantization VAE that compresses each observation-macro-action segment into a coarse-to-fine stack of discrete residual tokens, and a temporal Transformer that autoregressively predicts these token stacks from a short recent history. The resulting sequence model acts simultaneously as a context-conditioned prior over abstract actions and a latent dynamics model. At test time, I-TAP performs Monte Carlo Tree Search directly in token space, using short histories for implici...