[2601.04646] Succeeding at Scale: Automated Dataset Construction and Query-Side Adaptation for Multi-Tenant Search
About this article
Abstract page for arXiv paper 2601.04646: Succeeding at Scale: Automated Dataset Construction and Query-Side Adaptation for Multi-Tenant Search
Computer Science > Information Retrieval arXiv:2601.04646 (cs) [Submitted on 8 Jan 2026 (v1), last revised 3 Mar 2026 (this version, v3)] Title:Succeeding at Scale: Automated Dataset Construction and Query-Side Adaptation for Multi-Tenant Search Authors:Prateek Jain, Shabari S Nair, Ritesh Goru, Prakhar Agarwal, Ajay Yadav, Yoga Sri Varshan Varadharajan, Constantine Caramanis View a PDF of the paper titled Succeeding at Scale: Automated Dataset Construction and Query-Side Adaptation for Multi-Tenant Search, by Prateek Jain and 6 other authors View PDF HTML (experimental) Abstract:Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data". This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further propose an Index-Preserving Adaptation strategy that fine-tunes only the query encoder, achieving strong performance gains while keeping document indices fixed. Experiments on DevRev-Search, S...