[2603.00610] CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
About this article
Abstract page for arXiv paper 2603.00610: CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
Computer Science > Sound arXiv:2603.00610 (cs) [Submitted on 28 Feb 2026] Title:CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction Authors:Yinghao Ma, Haiwen Xia, Hewei Gao, Weixiong Chen, Yuxin Ye, Yuchen Yang, Sungkyun Chang, Mingshuo Ding, Yizhi Li, Ruibin Yuan, Simon Dixon, Emmanouil Benetos View a PDF of the paper titled CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction, by Yinghao Ma and 11 other authors View PDF HTML (experimental) Abstract:While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multimodal Instruction (CMI), where the generated music may be conditioned on text descriptions, lyrics, and audio prompts. We first introduce CMI-Pref-Pseudo, a large-scale preference dataset comprising 110k pseudo-labeled samples, and CMI-Pref, a high-quality, human-annotated corpus tailored for fine-grained alignment tasks. To unify the evaluation landscape, we propose CMI-RewardBench, a unified benchmark that evaluates music reward models on heterogeneous samples across musicality, text-music alignment, and compositional instruction alignment. Leveraging these resources, we develop CMI reward models (CMI-RMs), a parameter-efficient reward model family capable ...