[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Summary
The paper presents SAFER, a two-stage risk control framework for large language models (LLMs) that enhances output trustworthiness in risk-sensitive applications by combining abstention-aware sampling and conformalized filtering.
Why It Matters
As LLMs are increasingly utilized in critical applications like open-ended question answering, ensuring the reliability of their outputs is essential. SAFER addresses limitations in existing methods by introducing a robust framework that balances risk management and data efficiency, making it significant for developers and researchers in AI safety.
Key Takeaways
- SAFER combines abstention-aware sampling with conformalized filtering to enhance output trustworthiness.
- The framework allows for calibrated sampling budgets based on user-defined risk levels.
- SAFER is adaptable to various task-specific criteria, ensuring broad applicability.
- The method addresses the limitations of existing selective conformal prediction techniques.
- High data efficiency is achieved, making it suitable for real-world applications.
Computer Science > Artificial Intelligence arXiv:2510.10193 (cs) [Submitted on 11 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v3)] Title:SAFER: Risk-Constrained Sample-then-Filter in Large Language Models Authors:Qingni Wang, Yue Fan, Xin Eric Wang View a PDF of the paper titled SAFER: Risk-Constrained Sample-then-Filter in Large Language Models, by Qingni Wang and 2 other authors View PDF HTML (experimental) Abstract:As large language models (LLMs) are increasingly deployed in risk-sensitive applications such as real-world open-ended question answering (QA), ensuring the trustworthiness of their outputs has become critical. Existing selective conformal prediction (SCP) methods provide statistical guarantees by constructing prediction sets with a constrained miscoverage rate for correct answers. However, prior works unrealistically assume that admissible answers for all instances can be obtained via finite sampling, even for open-ended QA scenarios that lack a fixed and finite solution space. To address this, we introduce a two-stage risk control framework comprising abstention-aware sampling and conformalized filtering (SAFER). Firstly, on a held-out calibration set, SAFER calibrates a sampling budget within the maximum sampling cap, using the Clopper-Pearson exact method at a user-desired risk level (i.e., the maximum allowable miscoverage rate of the sampling sets). If the risk level cannot be satisfied within the cap, we abstain; otherwise, the calibrated sa...