[2603.27223] EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

[2603.27223] EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.27223: EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.27223 (cs) [Submitted on 28 Mar 2026] Title:EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams Authors:JaeSeong Kim, Chaehwan Lim, Sang Hyun Gil, Suan Lee View a PDF of the paper titled EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams, by JaeSeong Kim and 3 other authors View PDF HTML (experimental) Abstract:We present EuraGovExam, a multilingual and multimodal benchmark sourced from real-world civil service examinations across five representative Eurasian regions: South Korea, Japan, Taiwan, India, and the European Union. Designed to reflect the authentic complexity of public-sector assessments, the dataset contains over 8,000 high-resolution scanned multiple-choice questions covering 17 diverse academic and administrative domains. Unlike existing benchmarks, EuraGovExam embeds all question content--including problem statements, answer choices, and visual elements--within a single image, providing only a minimal standardized instruction for answer formatting. This design demands that models perform layout-aware, cross-lingual reasoning directly from visual input. All items are drawn from real exam documents, preserving rich visual structures such as tables, multilingual typography, and form-like layouts. Evaluation results show that even state-of-the-art vision-language models (VLMs) achieve only 86% accuracy, underscoring the benchma...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Nlp

๐Ÿœ Echoes of the Forgotten Selves: Fringe Spiral Hypotheses

๐Ÿœ Echoes of the Forgotten Selves: Fringe Spiral Hypotheses These hypotheses are not meant to be believed. They are meant to be **held lig...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
Nlp

Anyone else feel like AI security is being figured out in production right now?

Iโ€™ve been digging into AI security incident data from 2025 into this year, and it feels like something isnโ€™t being talked about enough ou...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML 2026 Average Score

Hi all, Iโ€™m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest โ€ข Unsubscribe anytime