[2603.28130] MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
About this article
Abstract page for arXiv paper 2603.28130: MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.28130 (cs) [Submitted on 30 Mar 2026] Title:MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios Authors:Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu View a PDF of the paper titled MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios, by Zhang Li and 9 other authors View PDF HTML (experimental) Abstract:We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluate how models perform on digital and photographed documents across diverse scripts and low-resource languages. MDPBench comprises 3,400 document images spanning 17 languages, diverse scripts, and varied photographic conditions, with high-quality annotations produced through a rigorous pipeline of expert model labeling, manual correction, and human verification. To ensure fair comparison and prevent data leakage, we maintain separate public and private evaluation splits. Our comprehensive evaluation of both open-source and closed-source models uncovers a striking finding: while closed-source models (notably Gemini3-Pro) prove relatively robust, open-source alternative...