[2604.17866] Latent Abstraction for Retrieval-Augmented Generation
About this article
Abstract page for arXiv paper 2604.17866: Latent Abstraction for Retrieval-Augmented Generation
Computer Science > Computation and Language arXiv:2604.17866 (cs) [Submitted on 20 Apr 2026 (v1), last revised 7 May 2026 (this version, v2)] Title:Latent Abstraction for Retrieval-Augmented Generation Authors:Ha Lan N.T, Minh-Anh Nguyen, Dung D. Le View a PDF of the paper titled Latent Abstraction for Retrieval-Augmented Generation, by Ha Lan N.T and 2 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) has become a standard approach for enhancing large language models (LLMs) with external knowledge, mitigating hallucinations, and improving factuality. However, existing systems rely on generating natural language queries at each hop and maintaining a strict architectural separation between retriever and generator, preventing them from leveraging the full representational capacity of the LLM. We propose \textbf{LAnR} (Latent Abstraction for RAG), a unified framework in which a single LLM jointly performs encoding, retrieval, and generation entirely within its own latent space. Rather than generating textual queries, LAnR produces dense retrieval vectors from the hidden states of a designated \texttt{[PRED]} token and uses them to match against encoded document representations from the same model. Furthermore, LAnR adaptively decides when sufficient evidence has been retrieved using a lightweight MLP control head over those same hidden states, eliminating both the separate retriever and explicit token-level stopping reasoning. This desig...