[2509.15130] Taming Video Models for 3D and 4D Generation via

[2509.15130] Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control

arXiv - AI March 24, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.15130: Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control

Computer Science > Graphics arXiv:2509.15130 (cs) [Submitted on 18 Sep 2025 (v1), last revised 21 Mar 2026 (this version, v3)] Title:Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control Authors:Chenxi Song, Yanming Yang, Tong Zhao, Ruibo Li, Chi Zhang View a PDF of the paper titled Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control, by Chenxi Song and 4 other authors View PDF HTML (experimental) Abstract:Video diffusion models have rich world priors, but their use in spatial tasks is limited by poor control, spatial-temporal inconsistent results, and entangled scene-camera dynamics. Current approaches, such as per-task fine-tuning or post-process warping, often introduce visual artifacts, fail to generalize, or incur high computational costs. We introduce WorldForge, a novel, training-free framework that operates purely at inference time to resolve these issues. Our method comprises three synergistic components. First, an intra-step refinement loop injects fine-grained motion guidance during the denoising process, iteratively correcting the output to ensure strict adherence to the target camera path. Second, an optical flow-based analysis identifies and isolates motion-related channels within the latent space. This allows our framework to selectively apply guidance, thereby decoupling motion from appearance and preserving visual fidelity. Third, a dual-path guidance strategy adaptively corrects for drift by comparing the guided g...

Originally published on March 24, 2026. Curated by AI News.

Machine Learning

[P] Create datasets from TikTok videos

For ML experiments and RAG projects: Tikkocampus converts creator timelines into timestamped, searchable segments and then use it to perf...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

[D] It’s 2026. Can we finally admit TensorFlow is the "COBOL of Machine Learning"?

We keep telling students to learn both, but let’s look at the actual landscape: Research: 95%+ of HuggingFace and arXiv is PyTorch. Innov...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

I have question for people who got job

how you guys getting job in ml as a fresher ?? I am in college. havent started learning ml but willing to . let me know exactly how to do...

Reddit - ML Jobs · 1 min · about 6 hours ago

Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

[2509.15130] Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control

About this article

Related Articles

[P] Create datasets from TikTok videos

[D] It’s 2026. Can we finally admit TensorFlow is the "COBOL of Machine Learning"?

I have question for people who got job

🤖 AI News Digest - March 27, 2026

No comments

Stay updated with AI News