[2511.00062] World Simulation with Video Foundation Models for Physical AI
Summary
The paper presents Cosmos-Predict2.5, an advanced model for world simulation in Physical AI, integrating various generation methods and improving video quality and instruction alignment.
Why It Matters
This research is significant as it enhances the capabilities of synthetic data generation and simulation for robotics and autonomous systems, which are crucial for advancing AI applications. The open-source release of the model aims to lower barriers for researchers and developers, fostering innovation in embodied intelligence.
Key Takeaways
- Cosmos-Predict2.5 integrates Text2World, Image2World, and Video2World generation in a single model.
- It is trained on 200M curated video clips and refined with reinforcement learning, improving video quality and instruction alignment.
- The model supports reliable synthetic data generation and closed-loop simulation for robotics.
- Cosmos-Transfer2.5 offers robust world translation capabilities, enhancing real-world application.
- Open resources are provided to accelerate research and deployment in Physical AI.
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.00062 (cs) [Submitted on 28 Oct 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:World Simulation with Video Foundation Models for Physical AI Authors:NVIDIA: Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, Prithvijit Chattopadhyay, Mike Chen, Yongxin Chen, Yu Chen, Shuai Cheng, Yin Cui, Jenna Diamond, Yifan Ding, Jiaojiao Fan, Linxi Fan, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Ruiyuan Gao, Yunhao Ge, Jinwei Gu, Aryaman Gupta, Siddharth Gururani, Imad El Hanafi, Ali Hassani, Zekun Hao, Jacob Huffman, Joel Jang, Pooya Jannaty, Jan Kautz, Grace Lam, Xuan Li, Zhaoshuo Li, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Yen-Chen Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Seungjun Nah, Yashraj Narang, Abhijeet Panaskar, Lindsey Pavao, Trung Pham, Morteza Ramezanali, Fitsum Reda, Scott Reed, Xuanchi Ren, Haonan Shao, Yue Shen, Stella Shi, Shuran Song, Bartosz Stefaniak, Shangkun Sun, Shitao Tang, Sameena Tasmeen, Lyne Tchapmi, Wei-Cheng Tseng, Jibin Varghese, Andrew Z. Wang, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Jiashu Xu, Dinghao Yang, Xiaodong Yang, Haotian Ye, Seonghyeon Ye, Xiaohui Zeng, Jing Zhang, Qinsheng Zhang, Kaiwen Zheng, Andrew Zhu, Yuke Zhu View a PDF of the paper titled World Simulation with Video Foundation Mod...