[2603.22677] MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
About this article
Abstract page for arXiv paper 2603.22677: MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
Computer Science > Artificial Intelligence arXiv:2603.22677 (cs) [Submitted on 24 Mar 2026] Title:MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation Authors:Di Zhu, Zixuan Li View a PDF of the paper titled MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation, by Di Zhu and 1 other authors View PDF HTML (experimental) Abstract:Distributional metrics such as Fréchet Audio Distance cannot score individual music clips and correlate poorly with human judgments, while the only per-sample learned metric achieving high human correlation is closed-source. We introduce MUQ-EVAL, an open-source per-sample quality metric for AIgenerated music built by training lightweight prediction heads on frozen MuQ-310M features using MusicEval, a dataset of generated clips from 31 text-to-music systems with expert quality ratings. Our simplest model, frozen features with attention pooling and a two-layer MLP, achieves system-level SRCC = 0.957 and utterance-level SRCC = 0.838 with human mean opinion scores. A systematic ablation over training objectives and adaptation strategies shows that no addition meaningfully improves the frozen baseline, indicating that frozen MuQ representations already capture quality-relevant information. Encoder choice is the dominant design factor, outweighing all architectural and training decisions. LoRA-adapted models trained on as few as 150 clips already achieve usable correlation, enabling person...