[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

arXiv - Machine Learning 4 min read Article

Summary

This study explores the use of small language models for extracting clinical information from low-resource languages, focusing on a privacy-preserving approach using a two-step pipeline with translation and model evaluation.

Why It Matters

As healthcare increasingly relies on natural language processing, this research addresses the critical challenge of extracting clinical data from low-resource languages. By demonstrating effective methods for privacy-preserving information extraction, it paves the way for improved healthcare analytics in multilingual settings, which is essential for equitable healthcare delivery.

Key Takeaways

  • The study evaluates a two-step pipeline combining translation and small language models for clinical information extraction.
  • Larger models consistently outperform smaller ones in extracting clinical features, highlighting the importance of model scale.
  • Translating transcripts from Persian to English enhances sensitivity and reduces missing outputs, despite some trade-offs in precision.
  • Reliable extraction of physiological symptoms was achieved, but challenges remain for psychological complaints and complex features.
  • The research provides a practical framework for deploying language models in low-resource healthcare environments.

Computer Science > Computation and Language arXiv:2602.21374 (cs) [Submitted on 24 Feb 2026] Title:Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages Authors:Mohammadreza Ghaffarzadeh-Esfahani, Nahid Yousefian, Ebrahim Heidari-Farsani, Ali Akbar Omidvarian, Sepehr Ghahraei, Atena Farangi, AmirBahador Boroumand View a PDF of the paper titled Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages, by Mohammadreza Ghaffarzadeh-Esfahani and 6 other authors View PDF HTML (experimental) Abstract:Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it ...

Related Articles

Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min ·
Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min ·
Find out what’s new in the Gemini app in March's Gemini Drop.
Llms

Find out what’s new in the Gemini app in March's Gemini Drop.

Gemini Drops is our regular monthly update on how to get the most out of the Gemini app.

AI Tools & Products · 1 min ·
Llms

Amazon is selling vintage-style ChatGPT AI smart glasses for $14 with a translator function

Amazon is selling vintage-style ChatGPT AI smart glasses for $14, featuring a translator function for enhanced usability.

AI Tools & Products · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime