[2601.05848] Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
About this article
Abstract page for arXiv paper 2601.05848: Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
Computer Science > Computer Vision and Pattern Recognition arXiv:2601.05848 (cs) [Submitted on 9 Jan 2026 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals Authors:Nate Gillman, Yinghua Zhou, Zitian Tang, Evan Luo, Arjan Chakravarthy, Daksh Aggarwal, Michael Freeman, Charles Herrmann, Chen Sun View a PDF of the paper titled Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals, by Nate Gillman and 8 other authors View PDF HTML (experimental) Abstract:Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object...