Machine Learning Nlp Generative Ai Ai Agents Data Science

[2602.17949] CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

arXiv - AI February 23, 2026 4 min read Article

Summary

CUICurate introduces a GraphRAG framework for automated curation of clinical concepts in NLP, enhancing efficiency and accuracy in clinical data processing.

Why It Matters

This framework addresses the labor-intensive process of clinical concept curation, which is crucial for effective NLP applications in healthcare. By automating the generation of concept sets, CUICurate significantly improves the scalability and reproducibility of clinical data analysis, ultimately aiding in better patient outcomes and research efficiency.

Key Takeaways

CUICurate automates the curation of clinical concept sets, reducing manual effort.
The framework utilizes a knowledge graph and large language models for enhanced accuracy.
It outperforms manual benchmarks in producing larger and more complete concept sets.
GPT-5-mini showed higher recall, while GPT-5 aligned better with clinician judgments.
Outputs are stable and computationally efficient, making it suitable for various clinical NLP applications.

Computer Science > Computation and Language arXiv:2602.17949 (cs) [Submitted on 20 Feb 2026] Title:CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications Authors:Victoria Blake, Mathew Miller, Jamie Novak, Sze-yuan Ooi, Blanca Gallego View a PDF of the paper titled CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications, by Victoria Blake and 3 other authors View PDF Abstract:Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but a concept set comprising related synonyms, subtypes, and supertypes. Constructing such concept sets is labour-intensive, inconsistently performed, and poorly supported by existing tools, particularly for NLP pipelines that operate directly on UMLS CUIs. Methods We present CUICurate, a Graph-based retrieval-augmented generation (GraphRAG) framework for automated UMLS concept set curation. A UMLS knowledge graph (KG) was constructed and embedded for semantic retrieval. For each target concept, candidate CUIs were retrieved from the KG, followed by large language model (LLM) filtering and classification steps comparing two LLMs (GPT-5 and GPT-5-mini). The framework was evaluated on five lexically heterogeneous clinical concepts against a manually curated benchmark and gold-standar...

Read Original Article

[2602.17949] CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

Summary

Why It Matters

Key Takeaways

Related Articles

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

[D] Those of you with 10+ years in ML — what is the public completely wrong about?

UMKC Announces New Master of Science in Artificial Intelligence

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

No comments

Stay updated with AI News