[2601.15518] DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking
Summary
This paper presents a two-stage retrieval system designed for the TREC Tip-of-the-Tongue task, integrating multiple retrieval methods with learned reranking to enhance information retrieval performance.
Why It Matters
The study addresses challenges in vague recollection retrieval, a common issue in information retrieval systems. By combining various methodologies, it offers insights into improving recall and ranking accuracy, which is crucial for applications in search engines and AI-driven information systems.
Key Takeaways
- Introduces a hybrid retrieval system combining LLM, sparse, and dense methods.
- Utilizes topic-aware multi-index dense retrieval to enhance performance.
- Achieves significant recall and NDCG scores, demonstrating the effectiveness of fusion retrieval.
- Generates synthetic queries to support model training, showcasing innovative approaches in data preparation.
- Highlights the importance of learned reranking in improving retrieval outcomes.
Computer Science > Information Retrieval arXiv:2601.15518 (cs) [Submitted on 21 Jan 2026 (v1), last revised 14 Feb 2026 (this version, v2)] Title:DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking Authors:Wenxin Zhou, Ritesh Mehta, Anthony Miyaguchi View a PDF of the paper titled DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking, by Wenxin Zhou and 2 other authors View PDF HTML (experimental) Abstract:We develop a two-stage retrieval system that combines multiple complementary retrieval methods with a learned reranker and LLM-based reranking, to address the TREC Tip-of-the-Tongue (ToT) task. In the first stage, we employ hybrid retrieval that merges LLM-based retrieval, sparse (BM25), and dense (BGE-M3) retrieval methods. We also introduce topic-aware multi-index dense retrieval that partitions the Wikipedia corpus into 24 topical domains. In the second stage, we evaluate both a trained LambdaMART reranker and LLM-based reranking. To support model training, we generate 5000 synthetic ToT queries using LLMs. Our best system achieves recall of 0.66 and NDCG@1000 of 0.41 on the test set by combining hybrid retrieval with Gemini-2.5-flash reranking, demonstrating the effectiveness of fusion retrieval. Comments: Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG) Cite as: arXiv:2601.15518 [cs.IR] (or arXiv:2601.15518v2 [cs.IR] for this ver...