[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities

[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

arXiv - AI March 24, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.27543: DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

Computer Science > Computation and Language arXiv:2510.27543 (cs) [Submitted on 31 Oct 2025 (v1), last revised 21 Mar 2026 (this version, v2)] Title:DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models Authors:Malik H. Altakrori, Nizar Habash, Abed Alhakim Freihat, Younes Samih, Kirill Chirkunov, Muhammed AbuOdeh, Radu Florian, Teresa Lynn, Preslav Nakov, Alham Fikri Aji View a PDF of the paper titled DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models, by Malik H. Altakrori and 9 other authors View PDF Abstract:We present DialectalArabicMMLU, a new benchmark for evaluating the performance of large language models (LLMs) across Arabic dialects. While recently developed Arabic and multilingual benchmarks have advanced LLM evaluation for Modern Standard Arabic (MSA), dialectal varieties remain underrepresented despite their prevalence in everyday communication. DialectalArabicMMLU extends the MMLU-Redux framework through manual translation and adaptation of 3K multiple-choice question-answer pairs into five major dialects (Syrian, Egyptian, Emirati, Saudi, and Moroccan), yielding a total of 15K QA pairs across 32 academic and professional domains (22K QA pairs when also including English and MSA). The benchmark enables systematic assessment of LLM reasoning and comprehension beyond MSA, supporting both task-based and linguistic analysis. We evaluate 19 open-weight Arabic and multil...

Originally published on March 24, 2026. Curated by AI News.

Llms

We hit 150 stars on our AI setup tool!

yo folks, we just hit 150 stars on our open source tool that auto makes AI context files. got 90 PRs merged and 20 issues that ppl are pi...

Reddit - Artificial Intelligence · 1 min · 4 minutes ago

Llms

Is ai getting dummer?

Over the past month, it feels like GPT and Gemini have been giving wrong answers a lot. Do you feel the same, or am I exaggerating? submi...

Reddit - Artificial Intelligence · 1 min · 4 minutes ago

Llms

If AI is really making us more productive... why does it feel like we are working more, not less...?

The promise of AI was the ultimate system optimisation: Efficiency. On paper, the tools are delivering something similar to what they pro...

Reddit - Artificial Intelligence · 1 min · 4 minutes ago

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · about 8 hours ago

[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

About this article

Related Articles

We hit 150 stars on our AI setup tool!

Is ai getting dummer?

If AI is really making us more productive... why does it feel like we are working more, not less...?

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

No comments

Stay updated with AI News