[2603.29897] UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates
About this article
Abstract page for arXiv paper 2603.29897: UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates
Computer Science > Information Retrieval arXiv:2603.29897 (cs) [Submitted on 8 Feb 2026] Title:UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates Authors:Yupei Yang, Lin Yang, Wanxi Deng, Lin Qu, Shikui Tu, Lei Xu View a PDF of the paper titled UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates, by Yupei Yang and 5 other authors View PDF HTML (experimental) Abstract:Reranking is a critical component in many information retrieval pipelines. Despite remarkable progress in text-only settings, multimodal reranking remains challenging, particularly when the candidate set contains hybrid text and image items. A key difficulty is the modality gap: a text reranker is intrinsically closer to text candidates than to image candidates, leading to biased and suboptimal cross-modal ranking. Vision-language models (VLMs) mitigate this gap through strong cross-modal alignment and have recently been adopted to build multimodal rerankers. However, most VLM-based rerankers encode all candidates as images, and treating text as images introduces substantial computational overhead. Meanwhile, existing open-source multimodal rerankers are typically trained on general-domain data and often underperform in domain-specific scenarios. To address these limitations, we propose UniRank, a VLM-based reranking framework that natively scores and orders hybrid text-image candidates without any modality conversion. Building on this hybrid scoring interfa...