[2504.18310] How much does context affect the accuracy of AI health advice?

[2504.18310] How much does context affect the accuracy of AI health advice?

arXiv - Machine Learning 4 min read Article

Summary

This article examines how linguistic and contextual factors influence the accuracy of AI-generated health advice, revealing significant disparities based on language, topic, and source.

Why It Matters

As AI systems are increasingly used in health communication, understanding their limitations in accuracy across different contexts is crucial. This research highlights the need for multilingual and domain-specific evaluations to ensure reliable health advice dissemination, particularly in diverse linguistic settings.

Key Takeaways

  • LLM accuracy in health advice varies significantly by language and context.
  • Performance is highest in English and closely related European languages.
  • Public-health claims show lower accuracy, especially in non-European languages.
  • Contextual factors like topic and source critically affect AI reliability.
  • Multilingual evaluations are essential before deploying AI in health communication.

Economics > General Economics arXiv:2504.18310 (econ) COVID-19 e-print Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field. [Submitted on 25 Apr 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:How much does context affect the accuracy of AI health advice? Authors:Prashant Garg, Thiemo Fetzer View a PDF of the paper titled How much does context affect the accuracy of AI health advice?, by Prashant Garg and 1 other authors View PDF Abstract:Large language models (LLMs) are increasingly used to provide health advice, yet evidence on how their accuracy varies across languages, topics and information sources remains limited. We assess how linguistic and contextual factors affect the accuracy of AI-based health-claim verification. We evaluated seven widely used LLMs on two datasets: (i) 1,975 legally authorised nutrition and health claims from UK and EU regulatory registers translated into 21 languages; and (ii) 9,088 journalist-vetted public-health claims from the PUBHEALTH corpus spanning COVID-19, abortion, politics and general health, drawn from government advisories, scientific abstracts and media sources. Models classified each claim as supported or unsupported using majority voting across repeated runs. Accuracy was analysed by ...

Related Articles

Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

What features do you actually want in an AI chatbot that nobody has built yet?

Hey everyone 👋 I'm building a new AI chat app and before I build anything I want to hear from real users first. Current AI tools like Cha...

Reddit - Artificial Intelligence · 1 min ·
Llms

So, what exactly is going on with the Claude usage limits?

I'm extremely new to AI and am building a local agent for fun. I purchased a Claude Pro account because it helped me a lot in the past wh...

Reddit - Artificial Intelligence · 1 min ·
Llms

Why the Reddit Hate of AI?

I just went through a project where a builder wanted to build a really large building on a small lot next door. The project needed 6 vari...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime