[2509.19668] Selective Classifier-free Guidance for Zero-shot

[2509.19668] Selective Classifier-free Guidance for Zero-shot Text-to-speech

arXiv - AI March 25, 2026 3 min read

About this article

Abstract page for arXiv paper 2509.19668: Selective Classifier-free Guidance for Zero-shot Text-to-speech

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2509.19668 (eess) [Submitted on 24 Sep 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Selective Classifier-free Guidance for Zero-shot Text-to-speech Authors:John Zheng, Farhad Maleki View a PDF of the paper titled Selective Classifier-free Guidance for Zero-shot Text-to-speech, by John Zheng and 1 other authors View PDF HTML (experimental) Abstract:In zero-shot text-to-speech, achieving a balance between fidelity to the target speaker and adherence to text content remains a challenge. While classifier-free guidance (CFG) strategies have shown promising results in image generation, their application to speech synthesis are underexplored. Separating the conditions used for CFG enables trade-offs between different desired characteristics in speech synthesis. In this paper, we evaluate the adaptability of CFG strategies originally developed for image generation to speech synthesis and extend separated-condition CFG approaches for this domain. Our results show that CFG strategies effective in image generation generally fail to improve speech synthesis. We also find that we can improve speaker similarity while limiting degradation of text adherence by applying standard CFG during early timesteps and switching to selective CFG only in later timesteps. Surprisingly, we observe that the effectiveness of a selective CFG strategy is highly text-representation dependent, as differences betwe...

Originally published on March 25, 2026. Curated by AI News.

Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min · about 8 hours ago

Machine Learning

[2603.14294] Seeking Physics in Diffusion Noise

Abstract page for arXiv paper 2603.14294: Seeking Physics in Diffusion Noise

arXiv - Machine Learning · 3 min · about 10 hours ago

Machine Learning

[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

Abstract page for arXiv paper 2512.22854: ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum ...

arXiv - Machine Learning · 4 min · about 10 hours ago

Machine Learning

[2601.08881] TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Abstract page for arXiv paper 2601.08881: TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

arXiv - AI · 4 min · about 10 hours ago

[2509.19668] Selective Classifier-free Guidance for Zero-shot Text-to-speech

About this article

Related Articles

Accelerating science with AI and simulations

[2603.14294] Seeking Physics in Diffusion Noise

[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

[2601.08881] TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

No comments

Stay updated with AI News