[2602.00104] R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation
About this article
Abstract page for arXiv paper 2602.00104: R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.00104 (cs) [Submitted on 25 Jan 2026 (v1), last revised 7 Apr 2026 (this version, v2)] Title:R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation Authors:Zhuohong Chen, Zhengxian Wu, Zirui Liao, Shenao Jiang, Hangrui Xu, Yang Chen, Chaokui Su, Xiaoyu Liu, Haoqian Wang View a PDF of the paper titled R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation, by Zhuohong Chen and 8 other authors View PDF HTML (experimental) Abstract:Vision-centric retrieval for VQA requires retrieving images to supply missing visual cues and integrating them into the reasoning process. However, selecting the right images and integrating them effectively into the model's reasoning remains this http URL address this challenge, we propose R3G, a modular Reasoning-Retrieval-Reranking this http URL first produces a brief reasoning plan that specifies the required visual cues, then adopts a two-stage strategy, with coarse retrieval followed by fine-grained reranking, to select evidence this http URL MRAG-Bench, R3G improves accuracy across six MLLM backbones and nine sub-scenarios, achieving state-of-the-art overall performance. Ablations show that sufficiency-aware reranking and reasoning steps are complementary, helping the model both choose the right images and use them well. We release code and data at this https URL. Subjects: Computer Vision and Pattern Recognitio...