[2603.20514] Evaluating Large Language Models on Historical Health

[2603.20514] Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study

arXiv - AI March 24, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.20514: Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study

Computer Science > Computation and Language arXiv:2603.20514 (cs) COVID-19 e-print Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field. [Submitted on 20 Mar 2026] Title:Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study Authors:Mohammed Rakibul Hasan View a PDF of the paper titled Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study, by Mohammed Rakibul Hasan View PDF HTML (experimental) Abstract:Large Language Models (LLMs) offer significant potential for delivering health information. However, their reliability in low-resource contexts remains uncertain. This study evaluates GPT-4, Gemini Pro, Llama~3, and Mistral-7B on health crisis-related enquiries concerning COVID-19, dengue, the Nipah virus, and Chikungunya in the low-resource context of Bangladesh. We constructed a question--answer dataset from authoritative sources and assessed model outputs through semantic similarity, expert-model cross-evaluation, and Natural Language Inference (NLI). Findings highlight both the strengths and limitations of LLMs in representing epidemiological history and health crisis knowledge, underscoring thei...

Originally published on March 24, 2026. Curated by AI News.

Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min · about 2 hours ago

Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min · about 3 hours ago

Llms

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

Anyone looking to switch their AI assistant will find it surprisingly easy, as it only takes a few steps to move from A to B. This is not...

AI Tools & Products · 4 min · about 3 hours ago

[2603.20514] Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study

About this article

Related Articles

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Google’s Gemini AI app debuts in Hong Kong

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

No comments

Stay updated with AI News