I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]
About this article
Kinda suprises me how little discussion there is around about mistakes in streaming TTS models People look for natural readers, high voice quality, expressive speech. And most models don't look dumb here and fail. They fail when you give them basic stuff like price, dates, URLs, promo codes, phone numbers. So I was looking for some info and found a benchmark that compares commercial real time streaming TTS models in terms of how they pronounce dates, URLs, acronyms, etc. They are checking 100...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket