[2602.16811] Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark
Summary
This article presents the DemosQA benchmark, evaluating monolingual and multilingual large language models (LLMs) for Greek question answering, highlighting the need for better resources in under-represented languages.
Why It Matters
The study addresses the gap in research on LLMs for under-resourced languages like Greek. By developing the DemosQA dataset and evaluation framework, it aims to enhance the effectiveness of QA systems, promoting inclusivity in AI and ensuring diverse cultural representation.
Key Takeaways
- Introduces DemosQA, a dataset tailored for Greek question answering.
- Evaluates 11 LLMs, comparing monolingual and multilingual performance.
- Highlights the importance of culturally relevant datasets in AI training.
- Provides a memory-efficient evaluation framework adaptable to various languages.
- Releases code and data to support reproducibility in research.
Computer Science > Computation and Language arXiv:2602.16811 (cs) [Submitted on 18 Feb 2026] Title:Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark Authors:Charalampos Mastrokostas, Nikolaos Giarelis, Nikos Karacapilidis View a PDF of the paper titled Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark, by Charalampos Mastrokostas and 2 other authors View PDF Abstract:Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite these advancements, research on LLMs has primarily targeted high-resourced languages (e.g., English), and only recently has attention shifted toward multilingual models. However, these models demonstrate a training data bias towards a small number of popular languages or rely on transfer learning from high- to under-resourced languages; this may lead to a misrepresentation of social, cultural, and historical aspects. To address this challenge, monolingual LLMs have been developed for under-resourced languages; however, their effectiveness remains less studied when compared to multilingual counterparts on language-specific tasks. In this study, we address this research gap in Greek QA by contributing: (i) DemosQA, a novel dataset, ...