[2602.21165] PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data
Summary
PVminer is a novel NLP framework designed to detect the patient voice in patient-generated data, improving the analysis of patient-provider communication.
Why It Matters
This research addresses the challenges of analyzing large volumes of patient-generated text, which is crucial for understanding patient perspectives and improving healthcare delivery. By integrating advanced NLP techniques, PVminer enhances the ability to extract meaningful insights from patient communications, potentially transforming patient-centered care.
Key Takeaways
- PVminer utilizes a domain-adapted NLP framework to analyze patient-generated text.
- The tool achieves high performance in detecting patient voice with F1 scores exceeding 80%.
- It integrates patient-specific BERT encoders and unsupervised topic modeling for enhanced semantic understanding.
- PVminer addresses the limitations of traditional qualitative coding frameworks in healthcare.
- The source code and annotated datasets will be publicly available for further research.
Computer Science > Computation and Language arXiv:2602.21165 (cs) [Submitted on 24 Feb 2026] Title:PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data Authors:Samah Fodeh, Linhai Ma, Yan Wang, Srivani Talakokkul, Ganesh Puthiaraju, Afshan Khan, Ashley Hagaman, Sarah Lowe, Aimee Roundtree View a PDF of the paper titled PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data, by Samah Fodeh and 8 other authors View PDF HTML (experimental) Abstract:Patient-generated text such as secure messages, surveys, and interviews contains rich expressions of the patient voice (PV), reflecting communicative behaviors and social determinants of health (SDoH). Traditional qualitative coding frameworks are labor intensive and do not scale to large volumes of patient-authored messages across health systems. Existing machine learning (ML) and natural language processing (NLP) approaches provide partial solutions but often treat patient-centered communication (PCC) and SDoH as separate tasks or rely on models not well suited to patient-facing language. We introduce PVminer, a domain-adapted NLP framework for structuring patient voice in secure patient-provider communication. PVminer formulates PV detection as a multi-label, multi-class prediction task integrating patient-specific BERT encoders (PV-BERT-base and PV-BERT-large), unsupervised topic modeling for thematic augmentation (PV-Topic-BERT), and fine-tuned classifiers for Cod...