[2602.08961] MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
Nlp

[2602.08961] MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2602.08961: MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.08961 (cs) [Submitted on 9 Feb 2026 (v1), last revised 28 Mar 2026 (this version, v2)] Title:MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE Authors:Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng View a PDF of the paper titled MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE, by Ruijie Zhu and 6 other authors View PDF HTML (experimental) Abstract:We present MotionCrafter, a framework that leverages video generators to jointly reconstruct 4D geometry and estimate dense motion from a monocular video. The key idea is a joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, together with a 4D VAE tailored to learn this representation effectively. Unlike prior work that strictly aligns 3D values and latents with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and can hurt performance. Instead, we propose a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments on multiple datasets show that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimizatio...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Nlp

Built an Event Kernel for Agent OSes that Coordinates Under Load: Real-Time Events, Replayable Logs, TTL subs, No Deadlocks

Agent systems are running on outdated infrastructure, manual state checks, endless polling, and fragile logs. Every workaround patches an...

Reddit - Artificial Intelligence · 1 min ·
[2603.13793] GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages
Nlp

[2603.13793] GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages

Abstract page for arXiv paper 2603.13793: GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Langu...

arXiv - AI · 4 min ·
[2602.08482] CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform
Llms

[2602.08482] CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform

Abstract page for arXiv paper 2602.08482: CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform

arXiv - AI · 3 min ·
[2603.12057] Coarse-Guided Visual Generation via Weighted h-Transform Sampling
Machine Learning

[2603.12057] Coarse-Guided Visual Generation via Weighted h-Transform Sampling

Abstract page for arXiv paper 2603.12057: Coarse-Guided Visual Generation via Weighted h-Transform Sampling

arXiv - AI · 4 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime