[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

arXiv - AI 4 min read Article

Summary

The paper presents SAFER, a two-stage risk control framework for large language models (LLMs) that enhances output trustworthiness in risk-sensitive applications by combining abstention-aware sampling and conformalized filtering.

Why It Matters

As LLMs are increasingly utilized in critical applications like open-ended question answering, ensuring the reliability of their outputs is essential. SAFER addresses limitations in existing methods by introducing a robust framework that balances risk management and data efficiency, making it significant for developers and researchers in AI safety.

Key Takeaways

  • SAFER combines abstention-aware sampling with conformalized filtering to enhance output trustworthiness.
  • The framework allows for calibrated sampling budgets based on user-defined risk levels.
  • SAFER is adaptable to various task-specific criteria, ensuring broad applicability.
  • The method addresses the limitations of existing selective conformal prediction techniques.
  • High data efficiency is achieved, making it suitable for real-world applications.

Computer Science > Artificial Intelligence arXiv:2510.10193 (cs) [Submitted on 11 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v3)] Title:SAFER: Risk-Constrained Sample-then-Filter in Large Language Models Authors:Qingni Wang, Yue Fan, Xin Eric Wang View a PDF of the paper titled SAFER: Risk-Constrained Sample-then-Filter in Large Language Models, by Qingni Wang and 2 other authors View PDF HTML (experimental) Abstract:As large language models (LLMs) are increasingly deployed in risk-sensitive applications such as real-world open-ended question answering (QA), ensuring the trustworthiness of their outputs has become critical. Existing selective conformal prediction (SCP) methods provide statistical guarantees by constructing prediction sets with a constrained miscoverage rate for correct answers. However, prior works unrealistically assume that admissible answers for all instances can be obtained via finite sampling, even for open-ended QA scenarios that lack a fixed and finite solution space. To address this, we introduce a two-stage risk control framework comprising abstention-aware sampling and conformalized filtering (SAFER). Firstly, on a held-out calibration set, SAFER calibrates a sampling budget within the maximum sampling cap, using the Clopper-Pearson exact method at a user-desired risk level (i.e., the maximum allowable miscoverage rate of the sampling sets). If the risk level cannot be satisfied within the cap, we abstain; otherwise, the calibrated sa...

Related Articles

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

AI Tools & Products · 3 min ·
Llms

Claude, OpenClaw and the new reality: AI agents are here — and so is the chaos

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime