[2509.21739] Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription
About this article
Abstract page for arXiv paper 2509.21739: Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription
Computer Science > Sound arXiv:2509.21739 (cs) [Submitted on 26 Sep 2025 (v1), last revised 5 Mar 2026 (this version, v2)] Title:Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription Authors:Michael Yeung, Keisuke Toyama, Toya Teramoto, Shusuke Takahashi, Tamaki Kojima View a PDF of the paper titled Noise-to-Notes: Diffusion-based Generation and Refinement for Automatic Drum Transcription, by Michael Yeung and 4 other authors View PDF HTML (experimental) Abstract:Automatic drum transcription (ADT) is traditionally formulated as a discriminative task to predict drum events from audio spectrograms. In this work, we redefine ADT as a conditional generative task and introduce Noise-to-Notes (N2N), a framework leveraging diffusion modeling to transform audio-conditioned Gaussian noise into drum events with associated velocities. This generative diffusion approach offers distinct advantages, including a flexible speed-accuracy trade-off and strong inpainting capabilities. However, the generation of binary onset and continuous velocity values presents a challenge for diffusion models, and to overcome this, we introduce an Annealed Pseudo-Huber loss to facilitate effective joint optimization. Finally, to augment low-level spectrogram features, we propose incorporating features extracted from music foundation models (MFMs), which capture high-level semantic information and enhance robustness to out-of-domain drum audio. Experimental results demo...