[2509.19668] Selective Classifier-free Guidance for Zero-shot Text-to-speech

[2509.19668] Selective Classifier-free Guidance for Zero-shot Text-to-speech

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2509.19668: Selective Classifier-free Guidance for Zero-shot Text-to-speech

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2509.19668 (eess) [Submitted on 24 Sep 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Selective Classifier-free Guidance for Zero-shot Text-to-speech Authors:John Zheng, Farhad Maleki View a PDF of the paper titled Selective Classifier-free Guidance for Zero-shot Text-to-speech, by John Zheng and 1 other authors View PDF HTML (experimental) Abstract:In zero-shot text-to-speech, achieving a balance between fidelity to the target speaker and adherence to text content remains a challenge. While classifier-free guidance (CFG) strategies have shown promising results in image generation, their application to speech synthesis are underexplored. Separating the conditions used for CFG enables trade-offs between different desired characteristics in speech synthesis. In this paper, we evaluate the adaptability of CFG strategies originally developed for image generation to speech synthesis and extend separated-condition CFG approaches for this domain. Our results show that CFG strategies effective in image generation generally fail to improve speech synthesis. We also find that we can improve speaker similarity while limiting degradation of text adherence by applying standard CFG during early timesteps and switching to selective CFG only in later timesteps. Surprisingly, we observe that the effectiveness of a selective CFG strategy is highly text-representation dependent, as differences betwe...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
[2603.14294] Seeking Physics in Diffusion Noise
Machine Learning

[2603.14294] Seeking Physics in Diffusion Noise

Abstract page for arXiv paper 2603.14294: Seeking Physics in Diffusion Noise

arXiv - Machine Learning · 3 min ·
[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning
Machine Learning

[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

Abstract page for arXiv paper 2512.22854: ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum ...

arXiv - Machine Learning · 4 min ·
[2601.08881] TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts
Machine Learning

[2601.08881] TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Abstract page for arXiv paper 2601.08881: TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

arXiv - AI · 4 min ·
More in Generative Ai: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime