[2512.05430] ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering
Summary
ArtistMus introduces a benchmark for music question answering, leveraging a diverse dataset to enhance retrieval-augmented generation models in music-related contexts.
Why It Matters
This research addresses the gap in effective music information retrieval by providing a structured benchmark and dataset, enabling improved performance in music question answering. It highlights the importance of artist-centric data in enhancing the capabilities of large language models in a culturally rich domain.
Key Takeaways
- ArtistMus provides a benchmark of 1,000 questions on 500 diverse artists.
- MusWikiDB contains 3.2M passages from 144K music-related Wikipedia pages.
- Retrieval-augmented generation significantly improves factual accuracy in music question answering.
- Open-source models can achieve performance close to proprietary models with RAG techniques.
- The resources released aim to advance research in music information retrieval.
Computer Science > Computation and Language arXiv:2512.05430 (cs) [Submitted on 5 Dec 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering Authors:Daeyong Kwon, SeungHeon Doh, Juhan Nam View a PDF of the paper titled ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering, by Daeyong Kwon and 2 other authors View PDF HTML (experimental) Abstract:Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval-augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy; open-source models gain up to +56.8 percentage points (for example, Qwen3 8B improves from 35.0 to 91.8), approaching proprietary model performance. ...