[2508.02872] Highlight & Summarize: RAG without the jailbreaks

[2508.02872] Highlight & Summarize: RAG without the jailbreaks

arXiv - Machine Learning 4 min read Article

Summary

The paper presents Highlight & Summarize (H&S), a novel design pattern for retrieval-augmented generation (RAG) systems that prevents jailbreaking by not revealing user queries to generative LLMs.

Why It Matters

As LLMs become more prevalent, ensuring their security against malicious prompts is critical. This research offers a promising approach to enhance the safety and reliability of AI systems, making it relevant for developers and researchers in AI safety and machine learning.

Key Takeaways

  • H&S separates the retrieval and summarization processes to enhance security.
  • The method prevents jailbreaking by not exposing user queries to the LLM.
  • H&S shows comparable or improved performance in QA tasks over traditional RAG systems.
  • This approach addresses the vulnerabilities of current LLMs effectively.
  • The research contributes to the ongoing dialogue on AI safety and responsible AI deployment.

Computer Science > Computation and Language arXiv:2508.02872 (cs) [Submitted on 4 Aug 2025 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Highlight & Summarize: RAG without the jailbreaks Authors:Giovanni Cherubin, Andrew Paverd View a PDF of the paper titled Highlight & Summarize: RAG without the jailbreaks, by Giovanni Cherubin and 1 other authors View PDF HTML (experimental) Abstract:Preventing jailbreaking and model hijacking of Large Language Models (LLMs) is an important yet challenging task. When interacting with a chatbot, malicious users can input specially crafted prompts that cause the LLM to generate undesirable content or perform a different task from its intended purpose. Existing systems attempt to mitigate this by hardening the LLM's system prompt or using additional classifiers to detect undesirable content or off-topic conversations. However, these probabilistic approaches are relatively easy to bypass due to the very large space of possible inputs and undesirable outputs. We present and evaluate Highlight & Summarize (H&S), a new design pattern for retrieval-augmented generation (RAG) systems that prevents these attacks by design. The core idea is to perform the same task as a standard RAG pipeline (i.e., to provide natural language answers to questions, based on relevant sources) without ever revealing the user's question to the generative LLM. This is achieved by splitting the pipeline into two components: a highlighter, which takes the user'...

Related Articles

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?
Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

AI Tools & Products · 12 min ·
Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute
Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min ·
How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'
Llms

How I use Claude for strategy, Gemini for research and ChatGPT for 'the grind'

AI Tools & Products · 9 min ·
Llms

Codex and Claude Code Can Work Together

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime