[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

arXiv - AI 4 min read Article

Summary

The paper presents SCAN, a novel approach for Semantic Document Layout Analysis that enhances Retrieval-Augmented Generation (RAG) systems, improving performance on visually rich documents.

Why It Matters

As Large Language Models and Vision-Language Models become integral in document processing, SCAN addresses the challenges of analyzing complex documents, offering significant performance improvements. This advancement is crucial for applications in AI-driven document retrieval and processing.

Key Takeaways

  • SCAN improves both textual and visual RAG performance significantly.
  • The model utilizes a coarse-grained semantic approach for efficient document analysis.
  • Experimental results show performance gains of up to 10.4 points over conventional methods.
  • Fine-tuning on annotated datasets enhances the model's accuracy.
  • The approach is beneficial for applications involving rich document content.

Computer Science > Artificial Intelligence arXiv:2505.14381 (cs) [Submitted on 20 May 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation Authors:Nobuhiro Ueda, Yuyang Dong, Krisztián Boros, Daiki Ito, Takuya Sera, Masafumi Oyamada View a PDF of the paper titled SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation, by Nobuhiro Ueda and 5 other authors View PDF HTML (experimental) Abstract:With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), rich document analysis technologies for applications like Retrieval-Augmented Generation (RAG) and visual RAG are gaining significant attention. Recent research indicates that using VLMs yields better RAG performance, but processing rich documents remains a challenge since a single page contains large amounts of information. In this paper, we present SCAN (SemantiC Document Layout ANalysis), a novel approach that enhances both textual and visual Retrieval-Augmented Generation (RAG) systems that work with visually rich documents. It is a VLM-friendly approach that identifies document components with appropriate semantic granularity, balancing context preservation with processing efficiency. SCAN uses a coarse-grained semantic approach that divides documents into coherent regions covering contiguous components. We trained the SCAN model by fine-tuning ob...

Related Articles

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge
Llms

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

Anthropic says “human error” resulted in a leak that exposed Claude Code’s source code. The leaked code, which has since been copied to G...

The Verge - AI · 4 min ·
You can now use ChatGPT with Apple’s CarPlay | The Verge
Llms

You can now use ChatGPT with Apple’s CarPlay | The Verge

ChatGPT is now accessible from your CarPlay dashboard if you have iOS 26.4 or newer and the latest version of the ChatGPT app.

The Verge - AI · 3 min ·
Llms

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

Hi Guys, My company is considering purchasing the Claude Enterprise plan. The main two constraints are: - Being able to block usage of Cl...

Reddit - Artificial Intelligence · 1 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime