[2602.13812] DTBench: A Synthetic Benchmark for Document-to-Table Extraction
Summary
DTBench introduces a synthetic benchmark for evaluating document-to-table extraction capabilities, addressing limitations in existing benchmarks and highlighting performance gaps in large language models.
Why It Matters
As document-to-table extraction becomes increasingly important for data analytics, DTBench provides a structured evaluation framework that helps researchers and developers understand the capabilities and limitations of current models, fostering advancements in the field.
Key Takeaways
- DTBench offers a capability-aware benchmark for Doc2Table extraction.
- The benchmark reveals significant performance gaps among mainstream LLMs.
- It emphasizes the importance of reasoning, faithfulness, and conflict resolution in extraction tasks.
- DTBench is publicly available, promoting further research and development.
- The benchmark's two-level taxonomy categorizes extraction capabilities into 5 major and 13 subcategories.
Computer Science > Databases arXiv:2602.13812 (cs) [Submitted on 14 Feb 2026] Title:DTBench: A Synthetic Benchmark for Document-to-Table Extraction Authors:Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, Yunjun Gao View a PDF of the paper titled DTBench: A Synthetic Benchmark for Document-to-Table Extraction, by Yuxiang Guo and 5 other authors View PDF Abstract:Document-to-table (Doc2Table) extraction derives structured tables from unstructured documents under a target schema, enabling reliable and verifiable SQL-based data analytics. Although large language models (LLMs) have shown promise in flexible information extraction, their ability to produce precisely structured tables remains insufficiently understood, particularly for indirect extraction that requires complex capabilities such as reasoning and conflict resolution. Existing benchmarks neither explicitly distinguish nor comprehensively cover the diverse capabilities required in Doc2Table this http URL argue that a capability-aware benchmark is essential for systematic evaluation. However, constructing such benchmarks using human-annotated document-table pairs is costly, difficult to scale, and limited in capability coverage. To address this, we adopt a reverse Table2Doc paradigm and design a multi-agent synthesis workflow to generate documents from ground-truth tables. Based on this approach, we present DTBench, a synthetic benchmark that adopts a proposed two-level taxonomy of Doc2Table capabilities...