[2602.21476] A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation

arXiv - Machine Learning February 26, 2026 4 min read Article

Summary

This paper presents a knowledge-driven approach for audio segmentation and source separation, utilizing music scores and model-based techniques to enhance performance without relying on pre-segmented training data.

Why It Matters

The research addresses limitations in traditional audio processing methods that depend on annotated data. By leveraging knowledge from music scores, the proposed approach offers a more autonomous and potentially more effective means of segmenting and separating audio, which is crucial for applications in music and film industries.

Key Takeaways

Introduces a model-based framework for audio segmentation and source separation.
Utilizes knowledge from music scores to improve segmentation accuracy.
Demonstrates superior results in music and cinematic audio applications compared to data-driven techniques.

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2602.21476 (eess) [Submitted on 25 Feb 2026] Title:A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation Authors:Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li, Chin-Hui Lee View a PDF of the paper titled A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation, by Chun-wei Ho and Sabato Marco Siniscalchi and Kai Li and Chin-Hui Lee View PDF HTML (experimental) Abstract:We propose a knowledge-driven, model-based approach to segmenting audio into single-category and mixed-category chunks with applications to source separation. "Knowledge" here denotes information associated with the data, such as music scores. "Model" here refers to tool that can be used for audio segmentation and recognition, such as hidden Markov models. In contrast to conventional learning that often relies on annotated data with given segment categories and their corresponding boundaries to guide the learning process, the proposed framework does not depend on any pre-segmented training data and learns directly from the input audio and its related knowledge sources to build all necessary models autonomously. Evaluation on simulation data shows that score-guided learning achieves very good music segmentation and separation results. Tested on movie track data for cinematic audio source separation also shows that util...

Read Original Article