[2602.16811] Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

[2602.16811] Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

arXiv - AI 4 min read Article

Summary

This article presents the DemosQA benchmark, evaluating monolingual and multilingual large language models (LLMs) for Greek question answering, highlighting the need for better resources in under-represented languages.

Why It Matters

The study addresses the gap in research on LLMs for under-resourced languages like Greek. By developing the DemosQA dataset and evaluation framework, it aims to enhance the effectiveness of QA systems, promoting inclusivity in AI and ensuring diverse cultural representation.

Key Takeaways

  • Introduces DemosQA, a dataset tailored for Greek question answering.
  • Evaluates 11 LLMs, comparing monolingual and multilingual performance.
  • Highlights the importance of culturally relevant datasets in AI training.
  • Provides a memory-efficient evaluation framework adaptable to various languages.
  • Releases code and data to support reproducibility in research.

Computer Science > Computation and Language arXiv:2602.16811 (cs) [Submitted on 18 Feb 2026] Title:Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark Authors:Charalampos Mastrokostas, Nikolaos Giarelis, Nikos Karacapilidis View a PDF of the paper titled Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark, by Charalampos Mastrokostas and 2 other authors View PDF Abstract:Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite these advancements, research on LLMs has primarily targeted high-resourced languages (e.g., English), and only recently has attention shifted toward multilingual models. However, these models demonstrate a training data bias towards a small number of popular languages or rely on transfer learning from high- to under-resourced languages; this may lead to a misrepresentation of social, cultural, and historical aspects. To address this challenge, monolingual LLMs have been developed for under-resourced languages; however, their effectiveness remains less studied when compared to multilingual counterparts on language-specific tasks. In this study, we address this research gap in Greek QA by contributing: (i) DemosQA, a novel dataset, ...

Related Articles

Llms

I am seeing Claude everywhere

Every single Instagram reel or TikTok I scroll i see people mentioning Claude and glazing it like it’s some kind of master tool that’s be...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED
Llms

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

Plus: The FBI says a recent hack of its wiretap tools poses a national security risk, attackers stole Cisco source code as part of an ong...

Wired - AI · 9 min ·
Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime