Machine Learning Computer Vision Generative Ai

[2602.02334] VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

arXiv - Machine Learning February 27, 2026 4 min read Article

Summary

The paper presents VQ-Style, a method for disentangling style and content in human motion data using Residual Vector Quantized Variational Autoencoders, enhancing style transfer applications.

Why It Matters

This research addresses the complex challenge of modeling human motion by separating stylistic features from semantic content. The proposed method has significant implications for applications in animation, robotics, and virtual reality, where realistic motion representation is crucial.

Key Takeaways

Introduces a novel method for disentangling style and content in motion data.
Utilizes Residual Vector Quantized Variational Autoencoders for effective representation.
Enhances style transfer capabilities without fine-tuning for unseen styles.
Demonstrates versatility in applications like style transfer and motion blending.
Integrates contrastive learning and information leakage loss for improved disentanglement.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.02334 (cs) [Submitted on 2 Feb 2026 (v1), last revised 26 Feb 2026 (this version, v2)] Title:VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations Authors:Fatemeh Zargarbashi, Dhruv Agrawal, Jakob Buhmann, Martin Guay, Stelian Coros, Robert W. Sumner View a PDF of the paper titled VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations, by Fatemeh Zargarbashi and 5 other authors View PDF HTML (experimental) Abstract:Human motion data is inherently rich and complex, containing both semantic content and subtle stylistic features that are challenging to model. We propose a novel method for effective disentanglement of the style and content in human motion data to facilitate style transfer. Our approach is guided by the insight that content corresponds to coarse motion attributes while style captures the finer, expressive details. To model this hierarchy, we employ Residual Vector Quantized Variational Autoencoders (RVQ-VAEs) to learn a coarse-to-fine representation of motion. We further enhance the disentanglement by integrating codebook learning with contrastive learning and a novel information leakage loss to organize the content and the style across different codebooks. We harness this disentangled representation using our simple and effective inference-time technique Quantized Code Swapping, which enables motion style transfer wit...

Read Original Article

[2602.02334] VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

Summary

Why It Matters

Key Takeaways

Related Articles

[P] I trained an AI to play Resident Evil 4 Remake using Behavioral Cloning + LSTM

[D] Why does it seem like open source materials on ML are incomplete? this is not enough...

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

Top 10 AI certifications and courses for 2026

No comments

Stay updated with AI News