[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual

[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

arXiv - AI April 29, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.21225: MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

Computer Science > Computation and Language arXiv:2601.21225 (cs) [Submitted on 29 Jan 2026 (v1), last revised 28 Apr 2026 (this version, v2)] Title:MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation Authors:Tianyi Xu, Kosei Uemura, Alfred Malengo Kondoro, Tadesse Destaw Belay, Catherine Nana Nyaah Essuman, Ifeoma Okoh, Ganiyat Afolabi, Ayodele Awokoya, David Ifeoluwa Adelani View a PDF of the paper titled MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation, by Tianyi Xu and 8 other authors View PDF HTML (experimental) Abstract:Large language models have made substantial progress in mathematical reasoning. However, benchmark development for multilingual evaluation has lagged behind English in both difficulty and recency. Recently, GSM-Symbolic showed a strong evidence of high variance when models are evaluated on different instantiations of the same question; however, the evaluation was conducted only in English. In this paper, we introduce MGSM-Pro, an extension of MGSM dataset with GSM-Symbolic approach. Our dataset provides five instantiations per MGSM question by varying names, digits and irrelevant context. Evaluations across nine languages reveal that many low-resource languages suffer large performance drops when tested on digit instantiations different from those in the original test set. We further find that models robustness in HRL setting do not necessarily translate to LRL. Moreover, proprieta...

Originally published on April 29, 2026. Curated by AI News.

Llms

New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement

Today, April 29, 2026, a new case, Stacey, et al. v. Altman, et al. was filed in a California federal court against OpenAI, alleging the ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Abstract page for arXiv paper 2603.09723: RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

arXiv - AI · 4 min · about 2 hours ago

Llms

[2601.08816] MemRec: Collaborative Memory-Augmented Agentic Recommender System

Abstract page for arXiv paper 2601.08816: MemRec: Collaborative Memory-Augmented Agentic Recommender System

arXiv - AI · 4 min · about 2 hours ago

Llms

[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Abstract page for arXiv paper 2601.03266: Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

arXiv - AI · 4 min · about 2 hours ago

[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

About this article

Related Articles

New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement

[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

[2601.08816] MemRec: Collaborative Memory-Augmented Agentic Recommender System

[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

No comments

Stay updated with AI News