[2603.02129] LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
About this article
Abstract page for arXiv paper 2603.02129: LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.02129 (cs) [Submitted on 2 Mar 2026] Title:LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation Authors:Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li View a PDF of the paper titled LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation, by Hualiang Wei and 3 other authors View PDF Abstract:We present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space (e.g., facial expressions and head pose) and uses the completed signals to drive high-fidelity avatar animation. LiftAvatar is a fine-grained, expression-controllable large-scale video diffusion Transformer that synthesizes high-quality, temporally coherent expression sequences conditioned on single or multiple reference images. The key idea is to lift incomplete input data into a richer kinematic representation, thereby strengthening both reconstruction and animation in downstream 3D avatar pipelines. To this end, we introduce (i) a multi-granularity expression control scheme that combines shading maps with expression coefficients for precise and stable driving, and (ii) a multi-reference conditioning mechanism that aggregates complementary cues from multiple frames, enabling strong 3D consistency and controllability. As a plug-and-play enhancer, LiftAvatar directly addresses the limited expressiveness and reconstruction artifacts of 3D...