[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2601.21225: MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

Computer Science > Computation and Language arXiv:2601.21225 (cs) [Submitted on 29 Jan 2026 (v1), last revised 28 Apr 2026 (this version, v2)] Title:MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation Authors:Tianyi Xu, Kosei Uemura, Alfred Malengo Kondoro, Tadesse Destaw Belay, Catherine Nana Nyaah Essuman, Ifeoma Okoh, Ganiyat Afolabi, Ayodele Awokoya, David Ifeoluwa Adelani View a PDF of the paper titled MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation, by Tianyi Xu and 8 other authors View PDF HTML (experimental) Abstract:Large language models have made substantial progress in mathematical reasoning. However, benchmark development for multilingual evaluation has lagged behind English in both difficulty and recency. Recently, GSM-Symbolic showed a strong evidence of high variance when models are evaluated on different instantiations of the same question; however, the evaluation was conducted only in English. In this paper, we introduce MGSM-Pro, an extension of MGSM dataset with GSM-Symbolic approach. Our dataset provides five instantiations per MGSM question by varying names, digits and irrelevant context. Evaluations across nine languages reveal that many low-resource languages suffer large performance drops when tested on digit instantiations different from those in the original test set. We further find that models robustness in HRL setting do not necessarily translate to LRL. Moreover, proprieta...

Originally published on April 29, 2026. Curated by AI News.

Related Articles

Llms

New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement

Today, April 29, 2026, a new case, Stacey, et al. v. Altman, et al. was filed in a California federal court against OpenAI, alleging the ...

Reddit - Artificial Intelligence · 1 min ·
[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
Llms

[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Abstract page for arXiv paper 2603.09723: RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

arXiv - AI · 4 min ·
[2601.08816] MemRec: Collaborative Memory-Augmented Agentic Recommender System
Llms

[2601.08816] MemRec: Collaborative Memory-Augmented Agentic Recommender System

Abstract page for arXiv paper 2601.08816: MemRec: Collaborative Memory-Augmented Agentic Recommender System

arXiv - AI · 4 min ·
[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support
Llms

[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Abstract page for arXiv paper 2601.03266: Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime