[2508.13404] TASER: Table Agents for Schema-guided Extraction and Recommendation

[2508.13404] TASER: Table Agents for Schema-guided Extraction and Recommendation

arXiv - Machine Learning 4 min read Article

Summary

The paper presents TASER, a system designed for schema-guided extraction and recommendation from complex financial tables, improving data normalization and extraction accuracy.

Why It Matters

TASER addresses the challenge of extracting critical financial data from unstructured tables, which is essential for risk assessment and decision-making in finance. By enhancing extraction processes, it supports better data management and analysis in financial contexts.

Key Takeaways

  • TASER improves extraction from complex financial tables by integrating continuous learning.
  • The system outperforms existing models by 10.1% in table detection accuracy.
  • Larger batch sizes significantly enhance schema recommendations and extraction totals.
  • The study involved extensive manual labeling of over 22,000 pages to train the model.
  • TASER facilitates better data normalization, crucial for financial analysis.

Computer Science > Artificial Intelligence arXiv:2508.13404 (cs) [Submitted on 18 Aug 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:TASER: Table Agents for Schema-guided Extraction and Recommendation Authors:Nicole Cho, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso View a PDF of the paper titled TASER: Table Agents for Schema-guided Extraction and Recommendation, by Nicole Cho and 4 other authors View PDF HTML (experimental) Abstract:Real-world financial filings report critical information about an entity's investment holdings, essential for assessing that entity's risk, profitability, and relationship profile. Yet, these details are often buried in messy, multi-page, fragmented tables that are difficult to parse, hindering downstream QA and data normalization. Specifically, 99.4% of the tables in our financial table dataset lack bounding boxes, with the largest table spanning 44 pages. To address this, we present TASER (Table Agents for Schema-guided Extraction and Recommendation), a continuously learning, agentic table extraction system that converts highly unstructured, multi-page, heterogeneous tables into normalized, schema-conforming outputs. Guided by an initial portfolio schema, TASER executes table detection, classification, extraction, and recommendations in a single pipeline. Our Recommender Agent reviews unmatched outputs and proposes schema revisions, enabling TASER to outperform vision-based table detection models such as Tab...

Related Articles

Llms

[R] Is autoresearch really better than classic hyperparameter tuning?

We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes bette...

Reddit - Machine Learning · 1 min ·
Nlp

Automate IOS devices through XCUITest with droidrun.

Automate iOS apps with XCUITest and Droidrun using just natural language. You send the command to Droidrun, and the agent starts the task...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] Trained a small BERT on 276K Kubernetes YAMLs using tree positional encoding instead of sequential

I trained a BERT-style transformer on 276K Kubernetes YAML files, replacing standard positional encoding with learned tree coordinates (d...

Reddit - Machine Learning · 1 min ·
Machine Learning

I am doing a multi-model graph database in pure Rust with Cypher, SQL, Gremlin, and native GNN looking for extreme speed and performance

Hi guys, I'm a PhD student in Applied AI and I've been building an embeddable graph database engine from scratch in Rust. I'd love feedba...

Reddit - Artificial Intelligence · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime