[2602.21456] Revisiting Text Ranking in Deep Research
Summary
The paper 'Revisiting Text Ranking in Deep Research' explores the effectiveness of text ranking methods in deep research settings, focusing on the use of large language models and search APIs.
Why It Matters
Understanding text ranking methods is crucial for improving the performance of AI agents in deep research tasks. This paper addresses the limitations of existing black-box search APIs and provides insights into optimizing retrieval strategies, which can enhance information retrieval systems and AI applications.
Key Takeaways
- Agent-issued queries often resemble web-search syntax, influencing retrieval effectiveness.
- Passage-level retrieval units are more efficient than document-level for limited context.
- Re-ranking significantly enhances retrieval outcomes.
- Translating queries into natural language improves query matching.
- The study utilizes the BrowseComp-Plus dataset for comprehensive evaluation.
Computer Science > Information Retrieval arXiv:2602.21456 (cs) [Submitted on 25 Feb 2026] Title:Revisiting Text Ranking in Deep Research Authors:Chuan Meng, Litu Ou, Sean MacAvaney, Jeff Dalton View a PDF of the paper titled Revisiting Text Ranking in Deep Research, by Chuan Meng and 3 other authors View PDF HTML (experimental) Abstract:Deep research has emerged as an important task that aims to address hard queries through extensive open-web exploration. To tackle it, most prior work equips large language model (LLM)-based agents with opaque web search APIs, enabling agents to iteratively issue search queries, retrieve external evidence, and reason over it. Despite search's essential role in deep research, black-box web search APIs hinder systematic analysis of search components, leaving the behaviour of established text ranking methods in deep research largely unclear. To fill this gap, we reproduce a selection of key findings and best practices for IR text ranking methods in the deep research setting. In particular, we examine their effectiveness from three perspectives: (i) retrieval units (documents vs. passages), (ii) pipeline configurations (different retrievers, re-rankers, and re-ranking depths), and (iii) query characteristics (the mismatch between agent-issued queries and the training queries of text rankers). We perform experiments on BrowseComp-Plus, a deep research dataset with a fixed corpus, evaluating 2 open-source agents, 5 retrievers, and 3 re-rankers ac...