[2602.17949] CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

[2602.17949] CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

arXiv - AI 4 min read Article

Summary

CUICurate introduces a GraphRAG framework for automated curation of clinical concepts in NLP, enhancing efficiency and accuracy in clinical data processing.

Why It Matters

This framework addresses the labor-intensive process of clinical concept curation, which is crucial for effective NLP applications in healthcare. By automating the generation of concept sets, CUICurate significantly improves the scalability and reproducibility of clinical data analysis, ultimately aiding in better patient outcomes and research efficiency.

Key Takeaways

  • CUICurate automates the curation of clinical concept sets, reducing manual effort.
  • The framework utilizes a knowledge graph and large language models for enhanced accuracy.
  • It outperforms manual benchmarks in producing larger and more complete concept sets.
  • GPT-5-mini showed higher recall, while GPT-5 aligned better with clinician judgments.
  • Outputs are stable and computationally efficient, making it suitable for various clinical NLP applications.

Computer Science > Computation and Language arXiv:2602.17949 (cs) [Submitted on 20 Feb 2026] Title:CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications Authors:Victoria Blake, Mathew Miller, Jamie Novak, Sze-yuan Ooi, Blanca Gallego View a PDF of the paper titled CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications, by Victoria Blake and 3 other authors View PDF Abstract:Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but a concept set comprising related synonyms, subtypes, and supertypes. Constructing such concept sets is labour-intensive, inconsistently performed, and poorly supported by existing tools, particularly for NLP pipelines that operate directly on UMLS CUIs. Methods We present CUICurate, a Graph-based retrieval-augmented generation (GraphRAG) framework for automated UMLS concept set curation. A UMLS knowledge graph (KG) was constructed and embedded for semantic retrieval. For each target concept, candidate CUIs were retrieved from the KG, followed by large language model (LLM) filtering and classification steps comparing two LLMs (GPT-5 and GPT-5-mini). The framework was evaluated on five lexically heterogeneous clinical concepts against a manually curated benchmark and gold-standar...

Related Articles

Machine Learning

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM, and I wanted to share it here ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Those of you with 10+ years in ML — what is the public completely wrong about?

For those of you who've been in ML/AI research or applied ML for 10+ years — what's the gap between what the public thinks AI is doing vs...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime