[2602.17687] IRPAPERS: A Visual Document Benchmark for Scientific Retrieval and Question Answering

[2602.17687] IRPAPERS: A Visual Document Benchmark for Scientific Retrieval and Question Answering

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces IRPAPERS, a benchmark for evaluating visual document retrieval and question answering, comparing image-based and text-based systems using a dataset of scientific papers.

Why It Matters

As AI systems increasingly handle multimodal data, understanding the effectiveness of visual document processing is crucial. IRPAPERS provides a structured approach to evaluate and improve retrieval methods, which can enhance scientific research efficiency and accuracy.

Key Takeaways

  • IRPAPERS benchmark includes 3,230 pages from 166 scientific papers for testing retrieval systems.
  • Image-based retrieval shows comparable performance to text-based methods, highlighting the potential of multimodal approaches.
  • Hybrid systems combining text and image retrieval outperform unimodal systems, achieving higher recall rates.
  • The dataset and code are publicly available, promoting further research in visual document processing.
  • Different question types favor either text or image modalities, indicating the need for tailored retrieval strategies.

Computer Science > Information Retrieval arXiv:2602.17687 (cs) [Submitted on 5 Feb 2026] Title:IRPAPERS: A Visual Document Benchmark for Scientific Retrieval and Question Answering Authors:Connor Shorten, Augustas Skaburskas, Daniel M. Jones, Charles Pierse, Roberto Esposito, John Trengrove, Etienne Dilocker, Bob van Luijt View a PDF of the paper titled IRPAPERS: A Visual Document Benchmark for Scientific Retrieval and Question Answering, by Connor Shorten and 7 other authors View PDF HTML (experimental) Abstract:AI systems have achieved remarkable success in processing text and relational data, yet visual document processing remains relatively underexplored. Whereas traditional systems require OCR transcriptions to convert these visual documents into text and metadata, recent advances in multimodal foundation models offer retrieval and generation directly from document images. This raises a key question: How do image-based systems compare to established text-based methods? We introduce IRPAPERS, a benchmark of 3,230 pages from 166 scientific papers, with both an image and an OCR transcription for each page. Using 180 needle-in-the-haystack questions, we compare image- and text-based retrieval and question answering systems. Text retrieval using Arctic 2.0 embeddings, BM25, and hybrid text search achieved 46% Recall@1, 78% Recall@5, and 91% Recall@20, while image-based retrieval reaches 43%, 78%, and 93%, respectively. The two modalities exhibit complementary failures, ena...

Related Articles

Llms

I Accidentally Discovered a Security Vulnerability in AI Education — Then Submitted It To a $200K Competition

Last night I was testing Maestro University, the first fully AI-taught university. I walked into their enrollment chatbot and asked it to...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is anyone else concerned with this blatant potential of security / privacy breach?

Recently, when sending a very sensitive email to my brother including my mother’s health information, I wondered what happens if a recipi...

Reddit - Artificial Intelligence · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime