[2602.19778] Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation
Summary
The paper presents a novel two-stage training approach for Automatic Chord Recognition (ACR), utilizing pseudo-labeling and knowledge distillation to enhance model performance with limited labeled data.
Why It Matters
This research addresses the challenge of scarce labeled data in ACR, which is crucial for music analysis and applications. By leveraging pre-trained models and unlabeled audio, the proposed method improves the efficiency and accuracy of chord recognition, making it significant for advancements in music technology and machine learning.
Key Takeaways
- The proposed method uses a two-stage training pipeline to enhance ACR.
- Pseudo-labeling allows models to learn from unlabeled audio effectively.
- Knowledge distillation helps retain learned representations during training.
- The BTC student model outperforms traditional supervised learning baselines.
- Significant improvements are observed in recognizing rare chord qualities.
Computer Science > Sound arXiv:2602.19778 (cs) [Submitted on 23 Feb 2026] Title:Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation Authors:Nghia Phan, Rong Jin, Gang Liu, Xiao Dong View a PDF of the paper titled Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation, by Nghia Phan and 3 other authors View PDF HTML (experimental) Abstract:Automatic Chord Recognition (ACR) is constrained by the scarcity of aligned chord labels, as well-aligned annotations are costly to acquire. At the same time, open-weight pre-trained models are currently more accessible than their proprietary training data. In this work, we present a two-stage training pipeline that leverages pre-trained models together with unlabeled audio. The proposed method decouples training into two stages. In the first stage, we use a pre-trained BTC model as a teacher to generate pseudo-labels for over 1,000 hours of diverse unlabeled audio and train a student model solely on these pseudo-labels. In the second stage, the student is continually trained on ground-truth labels as they become available, with selective knowledge distillation (KD) from the teacher applied as a regularizer to prevent catastrophic forgetting of the representations learned in the first stage. In our experiments, two models (BTC, 2E1D) were used as students. In stage 1, using only pseudo-labels, the BTC student achieves over 98% of the teacher's performance, while the 2E1D model...