[2604.17866] Latent Abstraction for Retrieval-Augmented Generation

arXiv - AI May 09, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.17866: Latent Abstraction for Retrieval-Augmented Generation

Computer Science > Computation and Language arXiv:2604.17866 (cs) [Submitted on 20 Apr 2026 (v1), last revised 7 May 2026 (this version, v2)] Title:Latent Abstraction for Retrieval-Augmented Generation Authors:Ha Lan N.T, Minh-Anh Nguyen, Dung D. Le View a PDF of the paper titled Latent Abstraction for Retrieval-Augmented Generation, by Ha Lan N.T and 2 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) has become a standard approach for enhancing large language models (LLMs) with external knowledge, mitigating hallucinations, and improving factuality. However, existing systems rely on generating natural language queries at each hop and maintaining a strict architectural separation between retriever and generator, preventing them from leveraging the full representational capacity of the LLM. We propose \textbf{LAnR} (Latent Abstraction for RAG), a unified framework in which a single LLM jointly performs encoding, retrieval, and generation entirely within its own latent space. Rather than generating textual queries, LAnR produces dense retrieval vectors from the hidden states of a designated \texttt{[PRED]} token and uses them to match against encoded document representations from the same model. Furthermore, LAnR adaptively decides when sufficient evidence has been retrieved using a lightweight MLP control head over those same hidden states, eliminating both the separate retriever and explicit token-level stopping reasoning. This desig...

Originally published on May 09, 2026. Curated by AI News.

Llms

GPT-5.5 may burn fewer tokens, but it always burns more cash

submitted by /u/NISMO1968 [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[2605.03213] When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Abstract page for arXiv paper 2605.03213: When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

arXiv - AI · 4 min · about 3 hours ago

Llms

[2603.15270] From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs

Abstract page for arXiv paper 2603.15270: From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs

arXiv - AI · 4 min · about 3 hours ago

Llms

[2603.09986] Quantifying Hallucinations in Language Language Models on Medical Textbooks

Abstract page for arXiv paper 2603.09986: Quantifying Hallucinations in Language Language Models on Medical Textbooks

arXiv - AI · 4 min · about 3 hours ago

[2604.17866] Latent Abstraction for Retrieval-Augmented Generation

About this article

Related Articles

GPT-5.5 may burn fewer tokens, but it always burns more cash

[2605.03213] When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

[2603.15270] From Documents to Spans: Scalable Supervision for Evidence-Based ICD Coding with LLMs

[2603.09986] Quantifying Hallucinations in Language Language Models on Medical Textbooks

No comments

Stay updated with AI News