[2602.14517] Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil
Summary
This article evaluates the mathematical reasoning capabilities of large language models (LLMs) in Sinhala and Tamil, revealing significant performance discrepancies compared to English.
Why It Matters
Understanding how LLMs perform in low-resource languages like Sinhala and Tamil is crucial for developing equitable AI technologies. This study challenges the assumption that strong multilingual performance translates to effective reasoning across all languages, highlighting the need for tailored evaluations.
Key Takeaways
- LLMs show robust performance in basic arithmetic across languages.
- Complex reasoning tasks reveal significant degradation in Tamil and Sinhala.
- Model performance varies by problem type, indicating non-uniform reasoning capabilities.
- The study emphasizes the importance of fine-grained evaluations in multilingual contexts.
- Findings challenge assumptions about multilingual competence in AI models.
Computer Science > Computation and Language arXiv:2602.14517 (cs) [Submitted on 16 Feb 2026] Title:Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil Authors:Sukumar Kishanthan, Kumar Thushalika, Buddhi Jayasekara, Asela Hevapathige View a PDF of the paper titled Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil, by Sukumar Kishanthan and 3 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) demonstrate strong mathematical reasoning in English, but whether these capabilities reflect genuine multilingual reasoning or reliance on translation-based processing in low-resource languages like Sinhala and Tamil remains unclear. We examine this fundamental question by evaluating whether LLMs genuinely reason mathematically in these languages or depend on implicit translation to English-like representations. Using a taxonomy of six math problem types, from basic arithmetic to complex unit conflict and optimization problems, we evaluate four prominent large language models. To avoid translation artifacts that confound language ability with translation quality, we construct a parallel dataset where each problem is natively authored by fluent speakers with mathematical training in all three languages. Our analysis demonstrates that while basic arithmetic reasoning transfers robustly across languages, complex reasoning tasks show significant degradation in Tamil and S...