[2603.19259] Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis
About this article
Abstract page for arXiv paper 2603.19259: Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis
Computer Science > Computation and Language arXiv:2603.19259 (cs) [Submitted on 26 Feb 2026] Title:Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis Authors:Yu-Siang Lan, Chia-Sheng Liu, Yi-Chang Chen, Po-Chun Hsu, Allyson Chiu, Shun-Wen Lin, Da-shan Shiu, Yuan-Fu Liao View a PDF of the paper titled Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis, by Yu-Siang Lan and 6 other authors View PDF HTML (experimental) Abstract:Taiwanese Hokkien (Taigi) presents unique opportunities for advancing speech technology methodologies that can generalize to diverse linguistic contexts. We introduce Breeze Taigi, a comprehensive framework centered on standardized benchmarks for evaluating Taigi speech recognition and synthesis systems. Our primary contribution is a reproducible evaluation methodology that leverages parallel Taiwanese Mandarin resources. We provide 30 carefully curated Mandarin-Taigi audio pairs from Taiwan's Executive Yuan public service announcements with normalized ground truth transcriptions. We establish Character Error Rate (CER) as the standard metric and implement normalization procedures to enable fair cross-system comparisons. To demonstrate the benchmark's utility and provide reference implementations, we develop speech recognition and synthesis models through a methodology that leverages existing Taiwanese Mandarin resources and large-scale synthetic data generation. In particul...