[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical

[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

arXiv - AI April 29, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.03266: Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Computer Science > Computation and Language arXiv:2601.03266 (cs) [Submitted on 18 Dec 2025 (v1), last revised 27 Apr 2026 (this version, v2)] Title:Benchmarking and Adapting On-Device LLMs for Clinical Decision Support Authors:Alif Munim, Jun Ma, Omar Ibrahim, Alhusain Abdalla, Shuolin Yin, Leo Chen, Bo Wang View a PDF of the paper titled Benchmarking and Adapting On-Device LLMs for Clinical Decision Support, by Alif Munim and 6 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have rapidly advanced in clinical decision-making, yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastructure. Open-source alternatives allow local inference but often have large model sizes that limit their use in resource-constrained clinical settings. Here, we benchmark on-device LLMs from the gpt-oss (20b, 120b), Qwen3.5 (9B, 27B, 35B), and Gemma 4 (31B) families across three representative clinical tasks: general disease diagnosis, specialty-specific (ophthalmology) diagnosis and management, and simulation of human expert grading and evaluation. We compare their performance with state-of-the-art proprietary models (GPT-5.1, GPT-5-mini, and Gemini 3.1 Pro) and a leading open-source model (DeepSeek-R1), and we further evaluate the adaptability of on-device systems by fine-tuning gpt-oss-20b and Qwen3.5-35B on general diagnostic data. Across tasks, on-device models achieve performance comparable to or exceedi...

Originally published on April 29, 2026. Curated by AI News.

Llms

Sources: Anthropic could raise a new $50B round at a valuation of $900B | TechCrunch

The maker of Claude has received multiple pre-emptive offers at valuations in the $850 billion to $900 billion range, according to source...

TechCrunch - AI · 5 min · about 1 hour ago

Llms

New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement

Today, April 29, 2026, a new case, Stacey, et al. v. Altman, et al. was filed in a California federal court against OpenAI, alleging the ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Abstract page for arXiv paper 2603.09723: RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

arXiv - AI · 4 min · about 3 hours ago

Llms

[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

Abstract page for arXiv paper 2601.21225: MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

arXiv - AI · 4 min · about 3 hours ago

[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

About this article

Related Articles

Sources: Anthropic could raise a new $50B round at a valuation of $900B | TechCrunch

New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement

[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

No comments

Stay updated with AI News