TTS Arena: Benchmarking Text-to-Speech Models in the Wild
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles TTS Arena: Benchmarking Text-to-Speech Models in the Wild Published February 27, 2024 Update on GitHub Upvote 72 +66 mrfakename mrfakename Follow guest Vaibhav Srivastav reach-vb Follow Clémentine Fourrier clefourrier Follow Lucain Pouget Wauplin Follow Yoach Lacombe ylacombe Follow Main Horse main-horse Follow guest Sanchit Gandhi sanchit-gandhi Follow Automated measurement of the quality of text-to-speech (TTS) models is very difficult. Assessing the naturalness and inflection of a voice is a trivial task for humans, but it is much more difficult for AI. This is why today, we’re thrilled to announce the TTS Arena. Inspired by LMSys's Chatbot Arena for LLMs, we developed a tool that allows anyone to easily compare TTS models side-by-side. Just submit some text, listen to two different models speak it out, and vote on which model you think is the best. The results will be organized into a leaderboard that displays the community’s highest-rated models. Motivation The field of speech synthesis has long lacked an accurate method to measure the quality of different models. Objective metrics like WER (word error rate) are unreliable measures of model quality, and subjective measures such as MOS (mean opinion score) are typically small-scale experiments conducted with few listeners. As a result, these measurements are generally not useful for comparing two models of roughly similar quality. To address these drawbacks, we are inviting the community to rank models...