[2603.17808] EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
About this article
Abstract page for arXiv paper 2603.17808: EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
Computer Science > Robotics arXiv:2603.17808 (cs) [Submitted on 18 Mar 2026 (v1), last revised 24 Mar 2026 (this version, v2)] Title:EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards Authors:Ruixiang Wang, Qingming Liu, Yueci Deng, Guiliang Liu, Zhen Liu, Kui Jia View a PDF of the paper titled EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards, by Ruixiang Wang and 5 other authors View PDF HTML (experimental) Abstract:Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the current observation and task instruction, and an inverse dynamics model (IDM) converts the generated frames into executable robot actions. However, current video world models lack explicit executability constraints. As a result, visually coherent rollouts may still violate rigid-body and kinematic consistency, producing unstable or infeasible control commands when decoded by an IDM. We refer to this mismatch between visual generation and physically executable control as the executability gap. While this gap can be mitigated at inference time using techniques such as rejection sampling, such approaches are inefficient due to the high cost of video generation. In this paper, we leverage the executability gap as a training signal and introduce Executable Video Alignment (EVA), a reinforcement-learning post-training framework for aligning vi...