Ai Safety Machine Learning

[2509.14959] Discrete optimal transport is a strong audio adversarial attack

arXiv - AI February 20, 2026 3 min read Article

Summary

The paper introduces a novel method called discrete optimal transport voice conversion (kDOT-VC), demonstrating its effectiveness as an audio adversarial attack against anti-spoofing measures.

Why It Matters

This research is significant as it highlights vulnerabilities in audio anti-spoofing technologies, which are critical for security in voice recognition systems. Understanding these weaknesses can lead to improved defenses and more robust AI systems.

Key Takeaways

kDOT-VC outperforms existing voice conversion methods in domain adaptation.
The method serves as a black-box adversarial attack against audio anti-spoofing countermeasures.
Distribution-level alignment is crucial for the stability and effectiveness of the attack.
The research provides insights into the probabilistic nature of optimal transport in audio processing.
Ablation analysis supports the robustness of the proposed attack method.

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2509.14959 (eess) [Submitted on 18 Sep 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Discrete optimal transport is a strong audio adversarial attack Authors:Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan View a PDF of the paper titled Discrete optimal transport is a strong audio adversarial attack, by Anton Selitskiy and 2 other authors View PDF HTML (experimental) Abstract:In this paper, we introduce the discrete optimal transport voice conversion ($k$DOT-VC) method. Comparison with $k$NN-VC, SinkVC, and Gaussian optimal transport (MKL) demonstrates stronger domain adaptation abilities of our method. We use the probabilistic nature of optimal transport (OT) and show that $k$DOT-VC is an effective black-box adversarial attack against modern audio anti-spoofing countermeasures (CMs). Our attack operates as a post-processing, distribution-alignment step: frame-level {WavLM} embeddings of generated speech are aligned to an unpaired bona fide pool via entropic OT and a top-$k$ barycentric projection, then decoded with a neural vocoder. Ablation analysis indicates that distribution-level alignment is a powerful and stable attack for deployed CMs. Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI) Cite as: arXiv:2509.14959 [eess.AS] (or arXiv:2509.14959v2 [eess.AS] for this version) https://doi.org/10.48550/arXiv.2509.14959 Focus to learn more arX...

Read Original Article

[2509.14959] Discrete optimal transport is a strong audio adversarial attack

Summary

Why It Matters

Key Takeaways

Related Articles

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

[2512.21106] Semantic Refinement with LLMs for Graph Representations

No comments

Stay updated with AI News