[2603.27223] EuraGovExam: A Multilingual Multimodal Benchmark from

[2603.27223] EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

arXiv - AI March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.27223: EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.27223 (cs) [Submitted on 28 Mar 2026] Title:EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams Authors:JaeSeong Kim, Chaehwan Lim, Sang Hyun Gil, Suan Lee View a PDF of the paper titled EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams, by JaeSeong Kim and 3 other authors View PDF HTML (experimental) Abstract:We present EuraGovExam, a multilingual and multimodal benchmark sourced from real-world civil service examinations across five representative Eurasian regions: South Korea, Japan, Taiwan, India, and the European Union. Designed to reflect the authentic complexity of public-sector assessments, the dataset contains over 8,000 high-resolution scanned multiple-choice questions covering 17 diverse academic and administrative domains. Unlike existing benchmarks, EuraGovExam embeds all question content--including problem statements, answer choices, and visual elements--within a single image, providing only a minimal standardized instruction for answer formatting. This design demands that models perform layout-aware, cross-lingual reasoning directly from visual input. All items are drawn from real exam documents, preserving rich visual structures such as tables, multilingual typography, and form-like layouts. Evaluation results show that even state-of-the-art vision-language models (VLMs) achieve only 86% accuracy, underscoring the benchma...

Originally published on March 31, 2026. Curated by AI News.

Nlp

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses These hypotheses are not meant to be believed. They are meant to be **held lig...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min · about 15 hours ago

Nlp

Anyone else feel like AI security is being figured out in production right now?

I’ve been digging into AI security incident data from 2025 into this year, and it feels like something isn’t being talked about enough ou...

Reddit - Artificial Intelligence · 1 min · about 18 hours ago

Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min · about 19 hours ago

[2603.27223] EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

About this article

Related Articles

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses

[P] Remote sensing foundation models made easy to use.

Anyone else feel like AI security is being figured out in production right now?

[D] ICML 2026 Average Score

No comments

Stay updated with AI News