[2512.05430] ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering

[2512.05430] ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering

arXiv - Machine Learning 4 min read Article

Summary

ArtistMus introduces a benchmark for music question answering, leveraging a diverse dataset to enhance retrieval-augmented generation models in music-related contexts.

Why It Matters

This research addresses the gap in effective music information retrieval by providing a structured benchmark and dataset, enabling improved performance in music question answering. It highlights the importance of artist-centric data in enhancing the capabilities of large language models in a culturally rich domain.

Key Takeaways

  • ArtistMus provides a benchmark of 1,000 questions on 500 diverse artists.
  • MusWikiDB contains 3.2M passages from 144K music-related Wikipedia pages.
  • Retrieval-augmented generation significantly improves factual accuracy in music question answering.
  • Open-source models can achieve performance close to proprietary models with RAG techniques.
  • The resources released aim to advance research in music information retrieval.

Computer Science > Computation and Language arXiv:2512.05430 (cs) [Submitted on 5 Dec 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering Authors:Daeyong Kwon, SeungHeon Doh, Juhan Nam View a PDF of the paper titled ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering, by Daeyong Kwon and 2 other authors View PDF HTML (experimental) Abstract:Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval-augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy; open-source models gain up to +56.8 percentage points (for example, Qwen3 8B improves from 35.0 to 91.8), approaching proprietary model performance. ...

Related Articles

Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center
Llms

Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center

Iran's Islamic Revolutionary Guard Corps (IRGC) issued this specific threat in a video update.

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

Anthropic cut Claude subscription access for Openclaw on April 4, pushing crypto AI agent users to pay-as-you-go billing.

AI Tools & Products · 7 min ·
I hit Claude’s new usage limits — and It changed how I use AI forever
Llms

I hit Claude’s new usage limits — and It changed how I use AI forever

Claude's message limits are dynamic, meaning they change based on site demand which is why I recommend using "Mega-Prompts" and utilizing...

AI Tools & Products · 8 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime