[2510.10125] Ctrl-World: A Controllable Generative World Model for Robot Manipulation
About this article
Abstract page for arXiv paper 2510.10125: Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Computer Science > Robotics arXiv:2510.10125 (cs) [Submitted on 11 Oct 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:Ctrl-World: A Controllable Generative World Model for Robot Manipulation Authors:Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, Chelsea Finn View a PDF of the paper titled Ctrl-World: A Controllable Generative World Model for Robot Manipulation, by Yanjiang Guo and 3 other authors View PDF HTML (experimental) Abstract:Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number of real-world rollouts, while systematic improvement demands additional corrective data with expert labels. Both of these processes are slow, costly, and difficult to scale. World models offer a promising, scalable alternative by enabling policies to rollout within imagination space. However, a key challenge is building a controllable world model that can handle multi-step interactions with generalist robot policies. This requires a world model compatible with modern generalist policies by supporting multi-view prediction, fine-grained action control, and consistent long-horizon interactions, which is not achieved by previous works. In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability...