[2602.15856] Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective
Summary
The paper presents SeleCom, a novel selector-based soft compression framework for Retrieval-Augmented Generation (RAG), addressing limitations of existing methods by enhancing efficiency and relevance in document retrieval for Large Language Models (LLMs).
Why It Matters
This research is significant as it tackles the scalability issues of RAG by improving the efficiency of information retrieval in LLMs. By introducing a query-conditioned selector, it enhances performance while reducing computational costs, which is crucial for real-world applications of AI in various domains.
Key Takeaways
- SeleCom improves the efficiency of Retrieval-Augmented Generation (RAG) by using a query-conditioned selector.
- The framework reduces computation and latency by 33.8% to 84.6% compared to existing soft compression methods.
- Full-compression methods dilute task-relevant information; SeleCom addresses this by focusing on relevant data.
- The approach is trained on a diverse synthetic QA dataset, enhancing its applicability across various tasks.
- SeleCom achieves competitive or superior performance compared to non-compressed baselines.
Computer Science > Computation and Language arXiv:2602.15856 (cs) [Submitted on 25 Jan 2026] Title:Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective Authors:Yunhao Liu, Zian Jia, Xinyu Gao, Kanjun Xu, Yun Xiong View a PDF of the paper titled Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective, by Yunhao Liu and Zian Jia and Xinyu Gao and Kanjun Xu and Yun Xiong View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft context compression aims to address this by encoding long documents into compact embeddings, yet they often underperform non-compressed RAG due to their reliance on auto-encoder-like full-compression that forces the encoder to compress all document information regardless of relevance to the input query. In this work, we conduct an analysis on this paradigm and reveal two fundamental limitations: (I) Infeasibility, full-compression conflicts with the LLM's downstream generation behavior; and (II) Non-necessity: full-compression is unnecessary and dilutes task-relevant information density. Motivated by these insights, we introduce SeleCom, a selector-based soft compression framework for RAG that redefines the enc...