[2602.13812] DTBench: A Synthetic Benchmark for Document-to-Table Extraction

[2602.13812] DTBench: A Synthetic Benchmark for Document-to-Table Extraction

arXiv - AI 4 min read Article

Summary

DTBench introduces a synthetic benchmark for evaluating document-to-table extraction capabilities, addressing limitations in existing benchmarks and highlighting performance gaps in large language models.

Why It Matters

As document-to-table extraction becomes increasingly important for data analytics, DTBench provides a structured evaluation framework that helps researchers and developers understand the capabilities and limitations of current models, fostering advancements in the field.

Key Takeaways

  • DTBench offers a capability-aware benchmark for Doc2Table extraction.
  • The benchmark reveals significant performance gaps among mainstream LLMs.
  • It emphasizes the importance of reasoning, faithfulness, and conflict resolution in extraction tasks.
  • DTBench is publicly available, promoting further research and development.
  • The benchmark's two-level taxonomy categorizes extraction capabilities into 5 major and 13 subcategories.

Computer Science > Databases arXiv:2602.13812 (cs) [Submitted on 14 Feb 2026] Title:DTBench: A Synthetic Benchmark for Document-to-Table Extraction Authors:Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, Yunjun Gao View a PDF of the paper titled DTBench: A Synthetic Benchmark for Document-to-Table Extraction, by Yuxiang Guo and 5 other authors View PDF Abstract:Document-to-table (Doc2Table) extraction derives structured tables from unstructured documents under a target schema, enabling reliable and verifiable SQL-based data analytics. Although large language models (LLMs) have shown promise in flexible information extraction, their ability to produce precisely structured tables remains insufficiently understood, particularly for indirect extraction that requires complex capabilities such as reasoning and conflict resolution. Existing benchmarks neither explicitly distinguish nor comprehensively cover the diverse capabilities required in Doc2Table this http URL argue that a capability-aware benchmark is essential for systematic evaluation. However, constructing such benchmarks using human-annotated document-table pairs is costly, difficult to scale, and limited in capability coverage. To address this, we adopt a reverse Table2Doc paradigm and design a multi-agent synthesis workflow to generate documents from ground-truth tables. Based on this approach, we present DTBench, a synthetic benchmark that adopts a proposed two-level taxonomy of Doc2Table capabilities...

Related Articles

Llms

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

I'm on Claude Max (the $100/mo plan) and noticed something that surprised me. By Monday noon I had already used 40% of the 20x monthly li...

Reddit - Artificial Intelligence · 1 min ·
How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch
Llms

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch

Learn how to use Spotify, Canva, Figma, Expedia, and other apps directly in ChatGPT.

TechCrunch - AI · 10 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime