[2602.21216] EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning
Summary
This article presents a study on enhancing EQ-5D classification using biomedical entity-enriched pre-trained language models and multiple instance learning, demonstrating improved performance in systematic literature reviews.
Why It Matters
The EQ-5D is crucial for assessing health-related quality of life, especially in health economics. This study addresses the challenges of manual literature screening, proposing a more efficient automated method that enhances accuracy and reduces errors, which is vital for researchers and healthcare professionals.
Key Takeaways
- The study improves EQ-5D detection using enriched language models.
- Entity enrichment enhances model generalization and adaptation.
- The proposed method significantly outperforms classical and recent PLM baselines.
- Multiple Instance Learning (MIL) is effective for aggregating information.
- Results indicate potential for automating systematic literature reviews.
Computer Science > Computation and Language arXiv:2602.21216 (cs) [Submitted on 30 Jan 2026] Title:EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning Authors:Zhyar Rzgar K Rostam, Gábor Kertész View a PDF of the paper titled EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning, by Zhyar Rzgar K Rostam and G\'abor Kert\'esz View PDF HTML (experimental) Abstract:The EQ-5D (EuroQol 5-Dimensions) is a standardized instrument for the evaluation of health-related quality of life. In health economics, systematic literature reviews (SLRs) depend on the correct identification of publications that use the EQ-5D, but manual screening of large volumes of scientific literature is time-consuming, error-prone, and inconsistent. In this study, we investigate fine-tuning of general-purpose (BERT) and domain-specific (SciBERT, BioBERT) pre-trained language models (PLMs), enriched with biomedical entity information extracted through scispaCy models for each statement, to improve EQ-5D detection from abstracts. We conduct nine experimental setups, including combining three scispaCy models with three PLMs, and evaluate their performance at both the sentence and study levels. Furthermore, we explore a Multiple Instance Learning (MIL) approach with attention pooling to aggregate sentence-level information into study-level predictions, where each abstract is represented as a b...