[2602.10058] Evaluating Disentangled Representations for Controllable Music Generation

[2602.10058] Evaluating Disentangled Representations for Controllable Music Generation

arXiv - Machine Learning 3 min read Article

Summary

This article evaluates disentangled representations in music generation, focusing on their effectiveness for controllable synthesis and identifying inconsistencies in current models.

Why It Matters

Understanding disentangled representations is crucial for advancing controllable music generation techniques. This research highlights the limitations of current methods, prompting a reevaluation of strategies to enhance the semantic clarity and usability of music generation models.

Key Takeaways

  • Disentangled representations are essential for controllable music synthesis.
  • Current models show inconsistencies between intended and actual semantics.
  • The study evaluates models using a probing-based framework across multiple axes.
  • Insights gained may inform future strategies for improving music generation.
  • A re-examination of controllability approaches is necessary for better outcomes.

Computer Science > Sound arXiv:2602.10058 (cs) [Submitted on 10 Feb 2026 (v1), last revised 15 Feb 2026 (this version, v2)] Title:Evaluating Disentangled Representations for Controllable Music Generation Authors:Laura Ibáñez-Martínez, Chukwuemeka Nkama, Andrea Poltronieri, Xavier Serra, Martín Rocamora View a PDF of the paper titled Evaluating Disentangled Representations for Controllable Music Generation, by Laura Ib\'a\~nez-Mart\'inez and 4 other authors View PDF HTML (experimental) Abstract:Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying properties of these embeddings remain underexplored. In this work, we evaluate such disentangled representations in a set of music audio models for controllable generation using a probing-based framework that goes beyond standard downstream tasks. The selected models reflect diverse unsupervised disentanglement strategies, including inductive biases, data augmentations, adversarial objectives, and staged training procedures. We further isolate specific strategies to analyze their effect. Our analysis spans four key axes: informativeness, equivariance, invariance, and disentanglement, which are assessed across datasets, tasks, and controlled transformations. Our findings reveal inconsistencies between intended and actual semantics of the embeddings, suggesting that current strategies fall short of pr...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[P] SpeakFlow - AI Dialogue Practice Coach with GLM 5.1

Built SpeakFlow for the Z.AI Builder Series hackathon. AI dialogue practice coach that evaluates your spoken responses in real-time. Two ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime