[2603.03485] Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
About this article
Abstract page for arXiv paper 2603.03485: Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.03485 (cs) [Submitted on 3 Mar 2026] Title:Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion Authors:Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu View a PDF of the paper titled Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion, by Haoran Lu and 11 other authors View PDF HTML (experimental) Abstract:Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time. In this work, we present \textbf{Phys4D}, a pipeline for learning physics-consistent 4D world representations from video diffusion models. Phys4D adopts \textbf{a three-stage training paradigm} that progressively lifts appearance-driven video diffusion models into physics-consistent 4D world representations. We first bootstrap robust geometry and motion representations through large-scale pseudo-supervised pretraining, establishing a foundation for 4D scene modeling. We then perform physics-grounded supervised fine-tuning using simulation-generated data, enforcing temporally consistent 4D dynamics. Finally, we apply simulation-grounded reinforcement learning to correct residual physical violations that are difficult to capture through expli...