[2504.18310] How much does context affect the accuracy of AI health advice?
Summary
This article examines how linguistic and contextual factors influence the accuracy of AI-generated health advice, revealing significant disparities based on language, topic, and source.
Why It Matters
As AI systems are increasingly used in health communication, understanding their limitations in accuracy across different contexts is crucial. This research highlights the need for multilingual and domain-specific evaluations to ensure reliable health advice dissemination, particularly in diverse linguistic settings.
Key Takeaways
- LLM accuracy in health advice varies significantly by language and context.
- Performance is highest in English and closely related European languages.
- Public-health claims show lower accuracy, especially in non-European languages.
- Contextual factors like topic and source critically affect AI reliability.
- Multilingual evaluations are essential before deploying AI in health communication.
Economics > General Economics arXiv:2504.18310 (econ) COVID-19 e-print Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field. [Submitted on 25 Apr 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:How much does context affect the accuracy of AI health advice? Authors:Prashant Garg, Thiemo Fetzer View a PDF of the paper titled How much does context affect the accuracy of AI health advice?, by Prashant Garg and 1 other authors View PDF Abstract:Large language models (LLMs) are increasingly used to provide health advice, yet evidence on how their accuracy varies across languages, topics and information sources remains limited. We assess how linguistic and contextual factors affect the accuracy of AI-based health-claim verification. We evaluated seven widely used LLMs on two datasets: (i) 1,975 legally authorised nutrition and health claims from UK and EU regulatory registers translated into 21 languages; and (ii) 9,088 journalist-vetted public-health claims from the PUBHEALTH corpus spanning COVID-19, abortion, politics and general health, drawn from government advisories, scientific abstracts and media sources. Models classified each claim as supported or unsupported using majority voting across repeated runs. Accuracy was analysed by ...