Llms Machine Learning Nlp

[2602.16811] Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv - AI February 20, 2026 4 min read Article

Summary

This article presents the DemosQA benchmark, evaluating monolingual and multilingual large language models (LLMs) for Greek question answering, highlighting the need for better resources in under-represented languages.

Why It Matters

The study addresses the gap in research on LLMs for under-resourced languages like Greek. By developing the DemosQA dataset and evaluation framework, it aims to enhance the effectiveness of QA systems, promoting inclusivity in AI and ensuring diverse cultural representation.

Key Takeaways

Introduces DemosQA, a dataset tailored for Greek question answering.
Evaluates 11 LLMs, comparing monolingual and multilingual performance.
Highlights the importance of culturally relevant datasets in AI training.
Provides a memory-efficient evaluation framework adaptable to various languages.
Releases code and data to support reproducibility in research.

Computer Science > Computation and Language arXiv:2602.16811 (cs) [Submitted on 18 Feb 2026] Title:Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark Authors:Charalampos Mastrokostas, Nikolaos Giarelis, Nikos Karacapilidis View a PDF of the paper titled Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark, by Charalampos Mastrokostas and 2 other authors View PDF Abstract:Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite these advancements, research on LLMs has primarily targeted high-resourced languages (e.g., English), and only recently has attention shifted toward multilingual models. However, these models demonstrate a training data bias towards a small number of popular languages or rely on transfer learning from high- to under-resourced languages; this may lead to a misrepresentation of social, cultural, and historical aspects. To address this challenge, monolingual LLMs have been developed for under-resourced languages; however, their effectiveness remains less studied when compared to multilingual counterparts on language-specific tasks. In this study, we address this research gap in Greek QA by contributing: (i) DemosQA, a novel dataset, ...

Read Original Article

[2602.16811] Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Summary

Why It Matters

Key Takeaways

Related Articles

I am seeing Claude everywhere

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

People anxious about deviating from what AI tells them to do?

No comments

Stay updated with AI News