[2510.26433] Co-Evolving Latent Action World Models
About this article
Abstract page for arXiv paper 2510.26433: Co-Evolving Latent Action World Models
Computer Science > Machine Learning arXiv:2510.26433 (cs) [Submitted on 30 Oct 2025 (v1), last revised 6 Apr 2026 (this version, v2)] Title:Co-Evolving Latent Action World Models Authors:Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian View a PDF of the paper titled Co-Evolving Latent Action World Models, by Yucen Wang and 5 other authors View PDF HTML (experimental) Abstract:Adapting pretrained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pretrained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empiricall...