[2604.06327] A Novel Automatic Framework for Speaker Drift Detection

[2604.06327] A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech

arXiv - AI April 09, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.06327: A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech

Computer Science > Sound arXiv:2604.06327 (cs) [Submitted on 7 Apr 2026] Title:A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech Authors:Jia-Hong Huang, Seulgi Kim, Yi Chieh Liu, Yixian Shen, Hongyi Zhu, Prayag Tiwari, Stevan Rudinac, Evangelos Kanoulas View a PDF of the paper titled A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech, by Jia-Hong Huang and 7 other authors View PDF HTML (experimental) Abstract:Recent diffusion-based text-to-speech (TTS) models achieve high naturalness and expressiveness, yet often suffer from speaker drift, a subtle, gradual shift in perceived speaker identity within a single utterance. This underexplored phenomenon undermines the coherence of synthetic speech, especially in long-form or interactive settings. We introduce the first automatic framework for detecting speaker drift by formulating it as a binary classification task over utterance-level speaker consistency. Our method computes cosine similarity across overlapping segments of synthesized speech and prompts large language models (LLMs) with structured representations to assess drift. We provide theoretical guarantees for cosine-based drift detection and demonstrate that speaker embeddings exhibit meaningful geometric clustering on the unit sphere. To support evaluation, we construct a high-quality synthetic benchmark with human-validated speaker drift annotations. Experiments with multiple state-of-the-art LLMs confirm the via...

Originally published on April 09, 2026. Curated by AI News.

Machine Learning

PyTorch reproduction of TensorFlow paper underperforms by 4 pp on DermaMNIST , what cross-framework issues should I check? [R]

I'm reproducing a published paper's hybrid Gabor + CNN architecture in PyTorch. The original implementation is in TensorFlow. My reproduc...

Reddit - Machine Learning · 1 min · 39 minutes ago

Machine Learning

eTPS Site Plan – Simple Leaderboard + What You’ll Actually See

Building on the last post, here’s what the first version of effectiveTPS will look like. **Core display (v1):** - Clean table comparing p...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Diffusion for generating/editing ASTs? [D]

I’m not a machine learning expert or anything, but I do enjoy learning about how it all works. I’ve noticed that one of the main limitati...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

TL;DR: Released en_legal_ner_ind_trf v0.1 - InLegalBERT fine-tuned on ~34,700 silver-annotated chunks from 33k Indian SC judgments. 13 la...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2604.06327] A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech

About this article

Related Articles

PyTorch reproduction of TensorFlow paper underperforms by 4 pp on DermaMNIST , what cross-framework issues should I check? [R]

eTPS Site Plan – Simple Leaderboard + What You’ll Actually See

Diffusion for generating/editing ASTs? [D]

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

No comments

Stay updated with AI News