[2602.19778] Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation

[2602.19778] Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a novel two-stage training approach for Automatic Chord Recognition (ACR), utilizing pseudo-labeling and knowledge distillation to enhance model performance with limited labeled data.

Why It Matters

This research addresses the challenge of scarce labeled data in ACR, which is crucial for music analysis and applications. By leveraging pre-trained models and unlabeled audio, the proposed method improves the efficiency and accuracy of chord recognition, making it significant for advancements in music technology and machine learning.

Key Takeaways

  • The proposed method uses a two-stage training pipeline to enhance ACR.
  • Pseudo-labeling allows models to learn from unlabeled audio effectively.
  • Knowledge distillation helps retain learned representations during training.
  • The BTC student model outperforms traditional supervised learning baselines.
  • Significant improvements are observed in recognizing rare chord qualities.

Computer Science > Sound arXiv:2602.19778 (cs) [Submitted on 23 Feb 2026] Title:Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation Authors:Nghia Phan, Rong Jin, Gang Liu, Xiao Dong View a PDF of the paper titled Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation, by Nghia Phan and 3 other authors View PDF HTML (experimental) Abstract:Automatic Chord Recognition (ACR) is constrained by the scarcity of aligned chord labels, as well-aligned annotations are costly to acquire. At the same time, open-weight pre-trained models are currently more accessible than their proprietary training data. In this work, we present a two-stage training pipeline that leverages pre-trained models together with unlabeled audio. The proposed method decouples training into two stages. In the first stage, we use a pre-trained BTC model as a teacher to generate pseudo-labels for over 1,000 hours of diverse unlabeled audio and train a student model solely on these pseudo-labels. In the second stage, the student is continually trained on ground-truth labels as they become available, with selective knowledge distillation (KD) from the teacher applied as a regularizer to prevent catastrophic forgetting of the representations learned in the first stage. In our experiments, two models (BTC, 2E1D) were used as students. In stage 1, using only pseudo-labels, the BTC student achieves over 98% of the teacher's performance, while the 2E1D model...

Related Articles

Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime