[2602.02334] VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations
Summary
The paper presents VQ-Style, a method for disentangling style and content in human motion data using Residual Vector Quantized Variational Autoencoders, enhancing style transfer applications.
Why It Matters
This research addresses the complex challenge of modeling human motion by separating stylistic features from semantic content. The proposed method has significant implications for applications in animation, robotics, and virtual reality, where realistic motion representation is crucial.
Key Takeaways
- Introduces a novel method for disentangling style and content in motion data.
- Utilizes Residual Vector Quantized Variational Autoencoders for effective representation.
- Enhances style transfer capabilities without fine-tuning for unseen styles.
- Demonstrates versatility in applications like style transfer and motion blending.
- Integrates contrastive learning and information leakage loss for improved disentanglement.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.02334 (cs) [Submitted on 2 Feb 2026 (v1), last revised 26 Feb 2026 (this version, v2)] Title:VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations Authors:Fatemeh Zargarbashi, Dhruv Agrawal, Jakob Buhmann, Martin Guay, Stelian Coros, Robert W. Sumner View a PDF of the paper titled VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations, by Fatemeh Zargarbashi and 5 other authors View PDF HTML (experimental) Abstract:Human motion data is inherently rich and complex, containing both semantic content and subtle stylistic features that are challenging to model. We propose a novel method for effective disentanglement of the style and content in human motion data to facilitate style transfer. Our approach is guided by the insight that content corresponds to coarse motion attributes while style captures the finer, expressive details. To model this hierarchy, we employ Residual Vector Quantized Variational Autoencoders (RVQ-VAEs) to learn a coarse-to-fine representation of motion. We further enhance the disentanglement by integrating codebook learning with contrastive learning and a novel information leakage loss to organize the content and the style across different codebooks. We harness this disentangled representation using our simple and effective inference-time technique Quantized Code Swapping, which enables motion style transfer wit...