Llms Machine Learning Data Science Ai Safety Generative Ai

[2504.18310] How much does context affect the accuracy of AI health advice?

arXiv - Machine Learning February 25, 2026 4 min read Article

Summary

This article examines how linguistic and contextual factors influence the accuracy of AI-generated health advice, revealing significant disparities based on language, topic, and source.

Why It Matters

As AI systems are increasingly used in health communication, understanding their limitations in accuracy across different contexts is crucial. This research highlights the need for multilingual and domain-specific evaluations to ensure reliable health advice dissemination, particularly in diverse linguistic settings.

Key Takeaways

LLM accuracy in health advice varies significantly by language and context.
Performance is highest in English and closely related European languages.
Public-health claims show lower accuracy, especially in non-European languages.
Contextual factors like topic and source critically affect AI reliability.
Multilingual evaluations are essential before deploying AI in health communication.

Economics > General Economics arXiv:2504.18310 (econ) COVID-19 e-print Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field. [Submitted on 25 Apr 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:How much does context affect the accuracy of AI health advice? Authors:Prashant Garg, Thiemo Fetzer View a PDF of the paper titled How much does context affect the accuracy of AI health advice?, by Prashant Garg and 1 other authors View PDF Abstract:Large language models (LLMs) are increasingly used to provide health advice, yet evidence on how their accuracy varies across languages, topics and information sources remains limited. We assess how linguistic and contextual factors affect the accuracy of AI-based health-claim verification. We evaluated seven widely used LLMs on two datasets: (i) 1,975 legally authorised nutrition and health claims from UK and EU regulatory registers translated into 21 languages; and (ii) 9,088 journalist-vetted public-health claims from the PUBHEALTH corpus spanning COVID-19, abortion, politics and general health, drawn from government advisories, scientific abstracts and media sources. Models classified each claim as supported or unsupported using majority voting across repeated runs. Accuracy was analysed by ...

Read Original Article

[2504.18310] How much does context affect the accuracy of AI health advice?

Summary

Why It Matters

Key Takeaways

Related Articles

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

What features do you actually want in an AI chatbot that nobody has built yet?

So, what exactly is going on with the Claude usage limits?

Why the Reddit Hate of AI?

No comments

Stay updated with AI News