[2602.14763] Unlocking Reasoning Capability on Machine Translation in Large Language Models
Summary
The paper evaluates the impact of reasoning-oriented large language models on machine translation, revealing that explicit reasoning often degrades translation quality and proposing a structured reasoning framework to enhance performance.
Why It Matters
This research is significant as it addresses the underexplored relationship between reasoning capabilities in large language models and their effectiveness in machine translation. By identifying the limitations of current models and proposing a structured approach, it paves the way for advancements in translation technology, which is crucial for global communication and information exchange.
Key Takeaways
- Explicit reasoning in large language models can degrade translation quality.
- Current reasoning traces in machine translation are linear and lack depth.
- Higher-quality reasoning from stronger models does not necessarily improve weaker models.
- A structured reasoning framework can enhance translation performance.
- The study highlights the need for task-structured reasoning to benefit machine translation.
Computer Science > Computation and Language arXiv:2602.14763 (cs) [Submitted on 16 Feb 2026] Title:Unlocking Reasoning Capability on Machine Translation in Large Language Models Authors:Sara Rajaee, Sebastian Vincent, Alexandre Berard, Marzieh Fadaee, Kelly Marchisio, Tom Kocmi View a PDF of the paper titled Unlocking Reasoning Capability on Machine Translation in Large Language Models, by Sara Rajaee and 5 other authors View PDF HTML (experimental) Abstract:Reasoning-oriented large language models (RLMs) achieve strong gains on tasks such as mathematics and coding by generating explicit intermediate reasoning. However, their impact on machine translation (MT) remains underexplored. We systematically evaluate several open- and closed-weights RLMs on the WMT24++ benchmark and find that enabling explicit reasoning consistently degrades translation quality across languages and models. Analysis reveals that MT reasoning traces are highly linear, lacking revision, self-correction and exploration of alternative translations, which limits their usefulness. Furthermore, injecting higher-quality reasoning traces from stronger models does not reliably improve weaker models' performance. To address this mismatch, we propose a structured reasoning framework tailored to translation, based on multi-step drafting, adequacy refinement, fluency improvement, and selective iterative revision. We curate a synthetic dataset of dynamic structured reasoning traces and post-train a large reasonin...