Oxford study reveals AI chatbots fail to deliver reliable medical advice to patients

Reviewed byNidhi Govil

3 Sources

Share

A University of Oxford study published in Nature Medicine found that AI chatbots like GPT-4o and Llama 3 provide medical advice no better than traditional internet searches. When 1,298 participants used AI for health guidance, they correctly identified conditions only 34.5% of the time and chose the right action in just 44.2% of cases, revealing a significant gap between AI's potential and real-world performance.

AI Chatbots Show No Advantage Over Traditional Methods for Medical Advice

A groundbreaking University of Oxford AI study published in Nature Medicine has revealed that AI chatbots provide medical advice no better than conventional internet searches, challenging the growing trend of patients turning to artificial intelligence for health guidance

1

. The research, led by Oxford's Internet Institute alongside medical practitioners, tested three prominent large language models in healthcare: OpenAI's GPT-4o, Meta's Llama 3, and Cohere's Command R+

1

. When tested in isolation, these Large Language Models (LLMs) demonstrated impressive diagnostic accuracy, identifying medical conditions in 94.9% of cases

1

. However, the real-world application painted a starkly different picture.

Source: Korea Times

Source: Korea Times

Communication Breakdown Exposes Risks of AI Health Advice

The study involved 1,298 human participants in Britain who were presented with 10 different medical scenarios, ranging from common ailments like headaches after drinking to life-threatening conditions such as subarachnoid haemorrhage causing bleeding on the brain

1

3

. When human participants used AI for patient medical advice, relevant conditions were identified in less than 34.5% of cases, and the correct course of action was determined in less than 44.2% of instancesβ€”no better than the control group using traditional resources like internet search engines or the National Health Service website. Dr. Adam Mahdi, co-author and associate professor at Oxford, emphasized the "huge gap" between AI's potential and its practical performance, noting that "the knowledge may be in those bots; however, this knowledge doesn't always translate when interacting with humans"

1

.

Dangers of Relying on AI for Medical Guidance Become Apparent

The research uncovered AI chatbots inaccurate medical advice stemming from both human error and flawed AI responses. Detailed analysis of approximately 30 patient interactions revealed that humans often provided incomplete information, while the AI systems generated misleading responses in critical situations

1

. In one alarming example involving symptoms of a subarachnoid haemorrhage, a patient describing the "worst headache ever" received correct advice to go to the hospital, while another patient with identical symptoms but describing a "terrible" headache was told to simply lie down in a darkened room

1

. Dr. Rebecca Payne, lead medical practitioner on the study, warned that asking chatbots about symptoms "can be dangerous, giving wrong diagnoses and failing to recognise when urgent help is needed"

3

.

Human Interaction with AI Reveals Fundamental Challenges

The study identified specific patterns in how human interaction with AI breaks down during medical consultations. Dr. Mahdi explained that "people share information gradually" and "leave things out," creating confusion when AI listed multiple possible conditions

2

. Participants struggled to distinguish useful health information from irrelevant details, and the quality of responses varied dramatically based on how questions were worded

2

. This communication breakdown represents a critical challenge for large language models in healthcare, even as these systems continue to ace medical licensing exams

3

. The medical risks extend beyond individual cases: one in six U.S. adults now consult AI chatbots about health information at least once monthly, with adoption rates expected to climb

3

. In the UK, polling by Mental Health UK found more than one in three residents use AI to support their mental health or wellbeing

2

. The research team plans to conduct similar studies across different countries and languages to determine if cultural and linguistic factors impact AI's performance, while experts call for clear national regulations and medical guidelines to govern AI versus internet search for health queries

1

2

.

Source: Reuters

Source: Reuters

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Β© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo