[2603.29042] An Empirical Recipe for Universal Phone Recognition
About this article
Abstract page for arXiv paper 2603.29042: An Empirical Recipe for Universal Phone Recognition
Computer Science > Computation and Language arXiv:2603.29042 (cs) [Submitted on 30 Mar 2026] Title:An Empirical Recipe for Universal Phone Recognition Authors:Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi, Eunjung Yeo, William Chen, Shinji Watanabe, David R. Mortensen View a PDF of the paper titled An Empirical Recipe for Universal Phone Recognition, by Shikhar Bharadwaj and 6 other authors View PDF HTML (experimental) Abstract:Phone recognition (PR) is a key enabler of multilingual and low-resource speech processing tasks, yet robust performance remains elusive. Highly performant English-focused models do not generalize across languages, while multilingual models underutilize pretrained representations. It also remains unclear how data scale, architecture, and training objective contribute to multilingual PR. We present PhoneticXEUS -- trained on large-scale multilingual data and achieving state-of-the-art performance on both multilingual (17.7% PFER) and accented English speech (10.6% PFER). Through controlled ablations with evaluations across 100+ languages under a unified scheme, we empirically establish our training recipe and quantify the impact of SSL representations, data scale, and loss objectives. In addition, we analyze error patterns across language families, accented speech, and articulatory features. All data and code are released openly. Comments: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing...