[2604.05467] CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
About this article
Abstract page for arXiv paper 2604.05467: CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
Computer Science > Information Retrieval arXiv:2604.05467 (cs) [Submitted on 7 Apr 2026] Title:CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation Authors:Siddharth Jain, Venkat Narayan Vedam View a PDF of the paper titled CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation, by Siddharth Jain and 1 other authors View PDF HTML (experimental) Abstract:As language models shift from single-shot answer generation toward multi-step reasoning that retrieves and consumes evidence mid-inference, evaluating the role of individual retrieved items becomes more important. Existing RAG evaluation typically targets final-answer quality, citation faithfulness, or answer-level attribution, but none of these directly targets the intervention-based, per-evidence-item utility view we study here. We introduce CUE-R, a lightweight intervention-based framework for measuring per-evidence-item operational utility in single-shot RAG using shallow observable retrieval-use traces. CUE-R perturbs individual evidence items via REMOVE, REPLACE, and DUPLICATE operators, then measures changes along three utility axes (correctness, proxy-based grounding faithfulness, and confidence error) plus a trace-divergence signal. We also outline an operational evidence-role taxonomy for interpreting intervention outcomes. Experiments on HotpotQA and 2WikiMultihopQA with Qwen-3 8B and GPT-5.2 reveal a consistent pattern: REMOVE and REPLACE substantially harm correctness and grounding while p...