[2508.13404] TASER: Table Agents for Schema-guided Extraction and Recommendation
Summary
The paper presents TASER, a system designed for schema-guided extraction and recommendation from complex financial tables, improving data normalization and extraction accuracy.
Why It Matters
TASER addresses the challenge of extracting critical financial data from unstructured tables, which is essential for risk assessment and decision-making in finance. By enhancing extraction processes, it supports better data management and analysis in financial contexts.
Key Takeaways
- TASER improves extraction from complex financial tables by integrating continuous learning.
- The system outperforms existing models by 10.1% in table detection accuracy.
- Larger batch sizes significantly enhance schema recommendations and extraction totals.
- The study involved extensive manual labeling of over 22,000 pages to train the model.
- TASER facilitates better data normalization, crucial for financial analysis.
Computer Science > Artificial Intelligence arXiv:2508.13404 (cs) [Submitted on 18 Aug 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:TASER: Table Agents for Schema-guided Extraction and Recommendation Authors:Nicole Cho, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso View a PDF of the paper titled TASER: Table Agents for Schema-guided Extraction and Recommendation, by Nicole Cho and 4 other authors View PDF HTML (experimental) Abstract:Real-world financial filings report critical information about an entity's investment holdings, essential for assessing that entity's risk, profitability, and relationship profile. Yet, these details are often buried in messy, multi-page, fragmented tables that are difficult to parse, hindering downstream QA and data normalization. Specifically, 99.4% of the tables in our financial table dataset lack bounding boxes, with the largest table spanning 44 pages. To address this, we present TASER (Table Agents for Schema-guided Extraction and Recommendation), a continuously learning, agentic table extraction system that converts highly unstructured, multi-page, heterogeneous tables into normalized, schema-conforming outputs. Guided by an initial portfolio schema, TASER executes table detection, classification, extraction, and recommendations in a single pipeline. Our Recommender Agent reviews unmatched outputs and proposes schema revisions, enabling TASER to outperform vision-based table detection models such as Tab...