[2602.14517] Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil

[2602.14517] Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil

arXiv - Machine Learning 4 min read Article

Summary

This article evaluates the mathematical reasoning capabilities of large language models (LLMs) in Sinhala and Tamil, revealing significant performance discrepancies compared to English.

Why It Matters

Understanding how LLMs perform in low-resource languages like Sinhala and Tamil is crucial for developing equitable AI technologies. This study challenges the assumption that strong multilingual performance translates to effective reasoning across all languages, highlighting the need for tailored evaluations.

Key Takeaways

  • LLMs show robust performance in basic arithmetic across languages.
  • Complex reasoning tasks reveal significant degradation in Tamil and Sinhala.
  • Model performance varies by problem type, indicating non-uniform reasoning capabilities.
  • The study emphasizes the importance of fine-grained evaluations in multilingual contexts.
  • Findings challenge assumptions about multilingual competence in AI models.

Computer Science > Computation and Language arXiv:2602.14517 (cs) [Submitted on 16 Feb 2026] Title:Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil Authors:Sukumar Kishanthan, Kumar Thushalika, Buddhi Jayasekara, Asela Hevapathige View a PDF of the paper titled Beyond Translation: Evaluating Mathematical Reasoning Capabilities of LLMs in Sinhala and Tamil, by Sukumar Kishanthan and 3 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) demonstrate strong mathematical reasoning in English, but whether these capabilities reflect genuine multilingual reasoning or reliance on translation-based processing in low-resource languages like Sinhala and Tamil remains unclear. We examine this fundamental question by evaluating whether LLMs genuinely reason mathematically in these languages or depend on implicit translation to English-like representations. Using a taxonomy of six math problem types, from basic arithmetic to complex unit conflict and optimization problems, we evaluate four prominent large language models. To avoid translation artifacts that confound language ability with translation quality, we construct a parallel dataset where each problem is natively authored by fluent speakers with mathematical training in all three languages. Our analysis demonstrates that while basic arithmetic reasoning transfers robustly across languages, complex reasoning tasks show significant degradation in Tamil and S...

Related Articles

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch
Llms

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch

Learn how to use Spotify, Canva, Figma, Expedia, and other apps directly in ChatGPT.

TechCrunch - AI · 10 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime