Llms Machine Learning Nlp Generative Ai Computer Vision

[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

arXiv - AI February 16, 2026 4 min read Article

Summary

The paper presents SCAN, a novel approach for Semantic Document Layout Analysis that enhances Retrieval-Augmented Generation (RAG) systems, improving performance on visually rich documents.

Why It Matters

As Large Language Models and Vision-Language Models become integral in document processing, SCAN addresses the challenges of analyzing complex documents, offering significant performance improvements. This advancement is crucial for applications in AI-driven document retrieval and processing.

Key Takeaways

SCAN improves both textual and visual RAG performance significantly.
The model utilizes a coarse-grained semantic approach for efficient document analysis.
Experimental results show performance gains of up to 10.4 points over conventional methods.
Fine-tuning on annotated datasets enhances the model's accuracy.
The approach is beneficial for applications involving rich document content.

Computer Science > Artificial Intelligence arXiv:2505.14381 (cs) [Submitted on 20 May 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation Authors:Nobuhiro Ueda, Yuyang Dong, Krisztián Boros, Daiki Ito, Takuya Sera, Masafumi Oyamada View a PDF of the paper titled SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation, by Nobuhiro Ueda and 5 other authors View PDF HTML (experimental) Abstract:With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), rich document analysis technologies for applications like Retrieval-Augmented Generation (RAG) and visual RAG are gaining significant attention. Recent research indicates that using VLMs yields better RAG performance, but processing rich documents remains a challenge since a single page contains large amounts of information. In this paper, we present SCAN (SemantiC Document Layout ANalysis), a novel approach that enhances both textual and visual Retrieval-Augmented Generation (RAG) systems that work with visually rich documents. It is a VLM-friendly approach that identifies document components with appropriate semantic granularity, balancing context preservation with processing efficiency. SCAN uses a coarse-grained semantic approach that divides documents into coherent regions covering contiguous components. We trained the SCAN model by fine-tuning ob...

Read Original Article

[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

Summary

Why It Matters

Key Takeaways

Related Articles

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

You can now use ChatGPT with Apple’s CarPlay | The Verge

Have Companies Began Adopting Claude Co-Work at an Enterprise Level?

What I learned about multi-agent coordination running 9 specialized Claude agents

No comments

Stay updated with AI News