[2604.03299] MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement
About this article
Abstract page for arXiv paper 2604.03299: MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.03299 (cs) [Submitted on 29 Mar 2026] Title:MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement Authors:Yejia Liu, Hengle Jiang, Haoxian Liu, Runxi Huang, Xiaomin Ouyang View a PDF of the paper titled MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement, by Yejia Liu and 4 other authors View PDF HTML (experimental) Abstract:3D human pose estimation is a key enabling technology for applications such as healthcare monitoring, human-robot collaboration, and immersive gaming, but real-world deployment remains challenged by viewpoint variations. Existing methods struggle to generalize to unseen camera viewpoints, require large amounts of training data, and suffer from high inference latency. We propose MoViD, a viewpoint-invariant 3D human pose estimation framework that disentangles viewpoint information from motion features. The key idea is to extract viewpoint information from intermediate pose features and leverage it to enhance both the robustness and efficiency of pose estimation. MoViD introduces a view estimator that models key joint relationships to predict viewpoint information, and an orthogonal projection module to disentangle motion and view features, further enhanced through physics-grounded contrastive alignment across views. For real-time edge deployment, MoViD employs a frame-by-frame inference pipeline with a view-aware strategy that adapti...