[2603.00924] Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

[2603.00924] Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.00924: Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains

Computer Science > Computation and Language arXiv:2603.00924 (cs) [Submitted on 1 Mar 2026] Title:Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains Authors:Manil Shrestha, Edward Kim View a PDF of the paper titled Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains, by Manil Shrestha and Edward Kim View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are increasingly used for medical entity extraction, yet their confidence scores are often miscalibrated, limiting safe deployment in clinical settings. We present a conformal prediction framework that provides finite-sample coverage guarantees for LLM-based extraction across two clinical domains. First, we extract structured entities from 1,000 FDA drug labels across eight sections using GPT-4.1, verified via FactScore-based atomic statement evaluation (97.7\% accuracy over 128,906 entities). Second, we extract radiological entities from MIMIC-CXR reports using the RadGraph schema with GPT-4.1 and Llama-4-Maverick, evaluated against physician annotations (entity F1: 0.81 to 0.84). Our central finding is that miscalibration direction reverses across domains: on well-structured FDA labels, models are underconfident, requiring modest conformal thresholds ($\tau \approx 0.06$), while on free-text radiology reports, models are overconfident, demanding strict thresholds ($\tau$ up to 0.99). Despite this heterogeneity, conformal prediction ...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Llms

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

Posting this for a friend who isn't on Reddit. A recent graduate, entry level, no commercial production experience but spent the past yea...

Reddit - ML Jobs · 1 min ·
I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED
Llms

I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED

Want to know what our reviewers have actually tested and picked as the best TVs, headphones, and laptops? Ask ChatGPT, and it'll give you...

Wired - AI · 8 min ·
A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok
Llms

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

This study evaluates the quality of AI-generated patient education guides on diet and exercise for chronic conditions, comparing five lan...

AI Tools & Products · 2 min ·
Llms

Agents Can Now Propose and Deploy Their Own Code Changes

150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime