[2509.05425] No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata
About this article
Abstract page for arXiv paper 2509.05425: No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata
Computer Science > Computation and Language arXiv:2509.05425 (cs) [Submitted on 5 Sep 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata Authors:Jessica M. Lundin, Ada Zhang, David Adelani, Cody Carroll View a PDF of the paper titled No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata, by Jessica M. Lundin and 3 other authors View PDF HTML (experimental) Abstract:We show that translation quality can be predicted with surprising accuracy \textit{without ever running the translation system itself}. Using only a handful of features, token fertility ratios, token counts, and basic linguistic metadata (language family, script, and region), we can forecast ChrF scores for GPT-4o translations across 203 languages in the FLORES-200 benchmark. Gradient boosting models achieve favorable performance ($R^{2}=0.66$ for XX$\rightarrow$English and $R^{2}=0.72$ for English$\rightarrow$XX). Feature importance analyses reveal that typological factors dominate predictions into English, while fertility plays a larger role for translations into diverse target languages. These findings suggest that translation quality is shaped by both token-level fertility and broader linguistic typology, offering new insights for multilingual evaluation and quality estimation. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2509.05425 [cs.CL] (or ar...