[2603.24955] Toward domain-specific machine translation and quality estimation systems
About this article
Abstract page for arXiv paper 2603.24955: Toward domain-specific machine translation and quality estimation systems
Computer Science > Computation and Language arXiv:2603.24955 (cs) [Submitted on 26 Mar 2026] Title:Toward domain-specific machine translation and quality estimation systems Authors:Javad Pourmostafa Roshan Sharami View a PDF of the paper titled Toward domain-specific machine translation and quality estimation systems, by Javad Pourmostafa Roshan Sharami View PDF HTML (experimental) Abstract:Machine Translation (MT) and Quality Estimation (QE) perform well in general domains but degrade under domain mismatch. This dissertation studies how to adapt MT and QE systems to specialized domains through a set of data-focused contributions. Chapter 2 presents a similarity-based data selection method for MT. Small, targeted in-domain subsets outperform much larger generic datasets and reach strong translation quality at lower computational cost. Chapter 3 introduces a staged QE training pipeline that combines domain adaptation with lightweight data augmentation. The method improves performance across domains, languages, and resource settings, including zero-shot and cross-lingual cases. Chapter 4 studies the role of subword tokenization and vocabulary in fine-tuning. Aligned tokenization-vocabulary setups lead to stable training and better translation quality, while mismatched configurations reduce performance. Chapter 5 proposes a QE-guided in-context learning method for large language models. QE models select examples that improve translation quality without parameter updates and o...