[2509.14959] Discrete optimal transport is a strong audio adversarial attack

[2509.14959] Discrete optimal transport is a strong audio adversarial attack

arXiv - AI 3 min read Article

Summary

The paper introduces a novel method called discrete optimal transport voice conversion (kDOT-VC), demonstrating its effectiveness as an audio adversarial attack against anti-spoofing measures.

Why It Matters

This research is significant as it highlights vulnerabilities in audio anti-spoofing technologies, which are critical for security in voice recognition systems. Understanding these weaknesses can lead to improved defenses and more robust AI systems.

Key Takeaways

  • kDOT-VC outperforms existing voice conversion methods in domain adaptation.
  • The method serves as a black-box adversarial attack against audio anti-spoofing countermeasures.
  • Distribution-level alignment is crucial for the stability and effectiveness of the attack.
  • The research provides insights into the probabilistic nature of optimal transport in audio processing.
  • Ablation analysis supports the robustness of the proposed attack method.

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2509.14959 (eess) [Submitted on 18 Sep 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Discrete optimal transport is a strong audio adversarial attack Authors:Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan View a PDF of the paper titled Discrete optimal transport is a strong audio adversarial attack, by Anton Selitskiy and 2 other authors View PDF HTML (experimental) Abstract:In this paper, we introduce the discrete optimal transport voice conversion ($k$DOT-VC) method. Comparison with $k$NN-VC, SinkVC, and Gaussian optimal transport (MKL) demonstrates stronger domain adaptation abilities of our method. We use the probabilistic nature of optimal transport (OT) and show that $k$DOT-VC is an effective black-box adversarial attack against modern audio anti-spoofing countermeasures (CMs). Our attack operates as a post-processing, distribution-alignment step: frame-level {WavLM} embeddings of generated speech are aligned to an unpaired bona fide pool via entropic OT and a top-$k$ barycentric projection, then decoded with a neural vocoder. Ablation analysis indicates that distribution-level alignment is a powerful and stable attack for deployed CMs. Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI) Cite as: arXiv:2509.14959 [eess.AS]   (or arXiv:2509.14959v2 [eess.AS] for this version)   https://doi.org/10.48550/arXiv.2509.14959 Focus to learn more arX...

Related Articles

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·
[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
More in Ai Safety: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime