[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

[2601.03266] Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2601.03266: Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Computer Science > Computation and Language arXiv:2601.03266 (cs) [Submitted on 18 Dec 2025 (v1), last revised 27 Apr 2026 (this version, v2)] Title:Benchmarking and Adapting On-Device LLMs for Clinical Decision Support Authors:Alif Munim, Jun Ma, Omar Ibrahim, Alhusain Abdalla, Shuolin Yin, Leo Chen, Bo Wang View a PDF of the paper titled Benchmarking and Adapting On-Device LLMs for Clinical Decision Support, by Alif Munim and 6 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have rapidly advanced in clinical decision-making, yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastructure. Open-source alternatives allow local inference but often have large model sizes that limit their use in resource-constrained clinical settings. Here, we benchmark on-device LLMs from the gpt-oss (20b, 120b), Qwen3.5 (9B, 27B, 35B), and Gemma 4 (31B) families across three representative clinical tasks: general disease diagnosis, specialty-specific (ophthalmology) diagnosis and management, and simulation of human expert grading and evaluation. We compare their performance with state-of-the-art proprietary models (GPT-5.1, GPT-5-mini, and Gemini 3.1 Pro) and a leading open-source model (DeepSeek-R1), and we further evaluate the adaptability of on-device systems by fine-tuning gpt-oss-20b and Qwen3.5-35B on general diagnostic data. Across tasks, on-device models achieve performance comparable to or exceedi...

Originally published on April 29, 2026. Curated by AI News.

Related Articles

Sources: Anthropic could raise a new $50B round at a valuation of $900B | TechCrunch
Llms

Sources: Anthropic could raise a new $50B round at a valuation of $900B | TechCrunch

The maker of Claude has received multiple pre-emptive offers at valuations in the $850 billion to $900 billion range, according to source...

TechCrunch - AI · 5 min ·
Llms

New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement

Today, April 29, 2026, a new case, Stacey, et al. v. Altman, et al. was filed in a California federal court against OpenAI, alleging the ...

Reddit - Artificial Intelligence · 1 min ·
[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
Llms

[2603.09723] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Abstract page for arXiv paper 2603.09723: RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

arXiv - AI · 4 min ·
[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation
Llms

[2601.21225] MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

Abstract page for arXiv paper 2601.21225: MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime