[2602.20595] OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
Summary
The paper presents OptiLeak, a framework utilizing reinforcement learning to enhance prompt reconstruction efficiency in multi-tenant LLM services while addressing prompt leakage vulnerabilities.
Why It Matters
As multi-tenant LLM frameworks become prevalent, understanding and mitigating prompt leakage risks is crucial for safeguarding sensitive information. OptiLeak's approach not only improves efficiency but also highlights the need for robust security measures in AI deployments, making it relevant for developers and security professionals in AI.
Key Takeaways
- OptiLeak uses reinforcement learning for efficient prompt reconstruction.
- Identifies 'hard tokens' to enhance preference alignment without manual annotation.
- Demonstrates significant reductions in request costs across various model sizes.
- Highlights the severity of cache-based prompt leakage threats.
- Calls for improved cache isolation in production AI systems.
Computer Science > Cryptography and Security arXiv:2602.20595 (cs) [Submitted on 24 Feb 2026] Title:OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services Authors:Longxiang Wang, Xiang Zheng, Xuhao Zhang, Yao Zhang, Ye Wu, Cong Wang View a PDF of the paper titled OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services, by Longxiang Wang and 5 other authors View PDF HTML (experimental) Abstract:Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities enabling prompt leakage attacks. Prior studies identified these attack surfaces yet focused on expanding attack vectors rather than optimizing attack performance, reporting impractically high attack costs that underestimate the true privacy risk. We propose OptiLeak, a reinforcement learning-enhanced framework that maximizes prompt reconstruction efficiency through two-stage fine-tuning. Our key insight is that domain-specific ``hard tokens'' -- terms difficult to predict yet carrying sensitive information -- can be automatically identified via likelihood ranking and used to construct preference pairs for Direct Preference Optimization, eliminating manual annotation. This enables effective preference alignment while avoiding the overfitting issues of extended supervised fine-tuning. Evaluated on three benchmarks spanning medical and financial domains, Opti...