[2603.00924] Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains
About this article
Abstract page for arXiv paper 2603.00924: Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains
Computer Science > Computation and Language arXiv:2603.00924 (cs) [Submitted on 1 Mar 2026] Title:Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains Authors:Manil Shrestha, Edward Kim View a PDF of the paper titled Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains, by Manil Shrestha and Edward Kim View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are increasingly used for medical entity extraction, yet their confidence scores are often miscalibrated, limiting safe deployment in clinical settings. We present a conformal prediction framework that provides finite-sample coverage guarantees for LLM-based extraction across two clinical domains. First, we extract structured entities from 1,000 FDA drug labels across eight sections using GPT-4.1, verified via FactScore-based atomic statement evaluation (97.7\% accuracy over 128,906 entities). Second, we extract radiological entities from MIMIC-CXR reports using the RadGraph schema with GPT-4.1 and Llama-4-Maverick, evaluated against physician annotations (entity F1: 0.81 to 0.84). Our central finding is that miscalibration direction reverses across domains: on well-structured FDA labels, models are underconfident, requiring modest conformal thresholds ($\tau \approx 0.06$), while on free-text radiology reports, models are overconfident, demanding strict thresholds ($\tau$ up to 0.99). Despite this heterogeneity, conformal prediction ...