[2602.16343] How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

[2602.16343] How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

arXiv - Machine Learning 3 min read Article

Summary

This article explores the dual role of neural audio codecs in labeling resynthesized audio for deepfake detection, highlighting the impact of labeling strategies on detection performance.

Why It Matters

As audio deepfakes become increasingly sophisticated, understanding how to effectively label resynthesized audio is crucial for improving detection methods. This research contributes to the field by addressing a significant gap in existing studies and offering insights that can enhance the reliability of audio verification systems.

Key Takeaways

  • Neural audio codecs serve dual purposes in audio synthesis and compression.
  • Labeling choices significantly influence the performance of deepfake detection systems.
  • The study extends the ASVspoof 5 dataset, providing a new framework for audio labeling.
  • Different labeling strategies can lead to varying detection outcomes.
  • Research in this area is essential for advancing audio deepfake detection technologies.

Computer Science > Sound arXiv:2602.16343 (cs) [Submitted on 18 Feb 2026] Title:How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection Authors:Yixuan Xiao, Florian Lux, Alejandro Pérez-González-de-Martos, Ngoc Thang Vu View a PDF of the paper titled How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection, by Yixuan Xiao and 3 other authors View PDF HTML (experimental) Abstract:Since Text-to-Speech systems typically don't produce waveforms directly, recent spoof detection studies use resynthesized waveforms from vocoders and neural audio codecs to simulate an attacker. Unlike vocoders, which are specifically designed for speech synthesis, neural audio codecs were originally developed for compressing audio for storage and transmission. However, their ability to discretize speech also sparked interest in language-modeling-based speech synthesis. Owing to this dual functionality, codec resynthesized data may be labeled as either bonafide or spoof. So far, very little research has addressed this issue. In this study, we present a challenging extension of the ASVspoof 5 dataset constructed for this purpose. We examine how different labeling choices affect detection performance and provide insights into labeling strategies. Comments: Subjects: Sound (cs.SD); Machine Learning (cs.LG) Cite as: arXiv:2602.16343 [cs.SD]   (or arXiv:2602.16343v1 [cs.SD] for this version)   https://doi.org/10.48550/a...

Related Articles

Machine Learning

[D] ICML Rebuttle Acknowledgement

I've received 3 out of 4 acknowledgements, All of them basically are choosing Option A without changing their scores, because their initi...

Reddit - Machine Learning · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

Auto agent - Self improving domain expertise agent

someone opensource an ai agent that autonomously upgraded itself to #1 across multiple domains in < 24 hours…. then open sourced the e...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime