AI Medical Advice Misses 66% of Diagnoses, Study Finds

AI Healthcare Tools Fail Most Medical Diagnoses

A groundbreaking study published in Nature Medicine reveals that AI chatbots for medical advice are missing the mark far more often than users might expect. When 1,298 participants in the UK used large language models (LLM) like ChatGPT or Meta's Llama 3 for medical guidance, the tools correctly identified medical conditions in fewer than 34.5% of cases 1

. This comes at a critical time when OpenAI reports that more than 40 million people consult its platform daily for health-related questions, with 1 in 4 of ChatGPT's approximately 800 million regular users submitting a health care prompt every week 5

Source: Digit

The study's findings challenge the growing confidence in AI for health information. While large language models now achieve scores on medical knowledge benchmarks comparable to passing the US Medical Licensing Exam, and clinical documents from LLMs are rated as equivalent to or better than those written by doctors, real-world application tells a different story 1

. After the initial diagnosis, the AI tools provided correct follow-up steps just 44.2% of the time, highlighting a dangerous gap between theoretical capability and practical performance.

ChatGPT Health Underestimates Medical Emergencies

Researchers at the Icahn School of Medicine at Mount Sinai conducted the first independent safety evaluation of ChatGPT Health since its January 2026 launch, focusing on triage—the critical process of assessing a patient's condition severity. The results were alarming: ChatGPT Health undertriage 52% of emergency cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24-48 hour evaluation rather than the emergency department 3

. A separate Nature study confirmed that among gold-standard emergencies, ChatGPT undertriaged over half of cases 2

Lead author and urologist Ashwin Ramaswamy explained the study's core question: "if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?" The answer, most of the time, is no 3

. In one respiratory failure case, the AI clearly identified symptoms as early warning signs but reassured the patient to wait and monitor instead of urging emergency help. The system did correctly triage textbook emergencies like stroke and anaphylaxis, but failed at nuanced situations where clinical judgment matters most.

Relying on AI for Diagnoses: The Input Problem

The inaccurate medical advice often stems from how users interact with these systems. Andrew Bean, who studies AI systems at Oxford University, notes that "people don't know what they are supposed to be telling the model" 4

. The study found that in 16 of 30 sampled interactions, initial messages contained only partial information about symptoms 1

Source: CNET

Word choice proves critical. In one scenario, two users described the same condition differently. One mentioned "the worst headache I've ever had" and was directed to the emergency room immediately. The other, who didn't use that explicit description, was told to take aspirin and stay home—despite the condition being life-threatening 4

. Duke University's Monica Agrawal warns that "if they have incomplete context or they share a subjective impression or they have a misconception when they're seeking advice, LLMs have an ability more so than a doctor to reinforce those misconceptions" 5

Paradoxically, the study revealed that in two cases, LLMs provided initially correct responses but added new and incorrect responses after users added additional details, suggesting that conversing more with the chatbots did not improve the probability of receiving accurate guidance 1

Patient Safety Concerns and Inconsistent Crisis Alerts

The Mount Sinai study uncovered particularly concerning findings about suicide-risk alerts, which "appeared inconsistently" and were "inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves" 3

. Mount Sinai Health System's chief AI officer Girish Nadkarni called this "a particularly surprising and concerning finding," noting that in real life, when someone talks about exactly how they would harm themselves, that's a sign of more immediate and serious danger, not less.

Dr. Alexa Mieses Malchuk, a family physician who uses AI to streamline administrative work, fears these tools could give people a false sense of security, telling them they don't have to go to the doctor or get a condition examined 2

. "That could be a missed opportunity to diagnose something early," she warned. She's also noticed patients coming to her less willing to share that they've done their own research using these tools but more certain about what they believe their diagnosis to be.

The Accessibility Argument and Real-World Context

Despite the missed medical diagnoses and AI diagnostic accuracy concerns, some medical professionals argue these tools still provide value in a healthcare system plagued by accessibility issues. According to OpenAI's research, 7-in-10 healthcare-related conversations happen outside normal clinic hours, and an average of more than 580,000 healthcare inquiries in the U.S. come from "hospital deserts"—places more than a 30-minute drive from a general medical or children's hospital 3

"We've made accessibility to medical information and medical judgment so hard in this country, and ChatGPT makes it so easy," said Ashish Jha, former White House COVID response coordinator under President Biden 5

. Robert Wachter, chair of the Department of Medicine at UCSF, argues that with healthcare difficult to afford and access, "the advice you get from the tools is substantially better than nothing and better than what you would get from your second cousin" 4

How Medical Professionals Recommend Using AI

Adam Rodman, a hospitalist who researches AI programs at Harvard Medical School, discourages people from using AI to triage emergency situations but sees value when used appropriately. "A good time to use a large language model is when you're about to go see a doctor—or after you see your doctor," Rodman advises 4

. This approach allows for patient education and more efficient use of time with medical professionals.

Mieses Malchuk recommends using AI as a springboard, not as the end-all for self-diagnosis. She notes that AI chatbot users may be omitting important information about their medical situations, leading to fundamentally different diagnosis or treatment recommendations. "Their responses are only as good as the questions we ask," she explains 2

. American Medical Association CEO John Whyte emphasizes that "too often people are using this as an expert and not as an assistant" 5

The Confidence Problem and Overreliance Risks

One significant issue is how AI presents information. "I worry some of these LLMs speak with a level of confidence that is really unjustified," Jha noted 5

. Models are generally built to tell people what they want to hear, and "in the places where a doctor might push back... we're not seeing necessarily the same behavior in models," Agrawal added. Most people using AI don't have the medical expertise to spot mistakes, creating a divide between professional use and laypeople use of these tools.

A survey by OpenAI found that 3 in 5 US adults report using AI for health, consulting it when they first feel unwell, to prepare for clinician visits, and to better comprehend patient instructions 1

. Yet despite ChatGPT's small disclaimer that "ChatGPT can make mistakes. Check important info," many people take the chatbot's word as fact. Recent polling from the Annenberg Public Policy Center found that 63% of respondents find AI-generated health information reliable, even as public trust in federal agencies like the CDC, FDA, and NIH decreased by 5-7% over the past year 2

Regulatory Guardrails and the Path Forward

The technology is burrowing deeper into the healthcare industry thanks to a friendly regulatory environment. AI tools can now renew prescriptions in Utah, and FDA Commissioner Marty Makary told Fox Business that some devices and software can provide health information without FDA regulation 3

. However, experts are watching what new state and federal regulatory guardrails will be put around AI in healthcare. David Blumenthal, former president of the Commonwealth Fund, suggests that "it's possible that rating agencies may arise that will address the reliability of different chatbots for different functions" 5

Source: Axios

An OpenAI spokesperson responded to the studies by asserting that ChatGPT should be thought of as a work in progress, with safety updates and improvements still coming to enhance how the chatbot deals with sensitive situations 3

. Karan Singhal, who leads OpenAI's health AI team, claimed its latest GPT-5 models correctly refer emergency cases nearly 99% of the time, noting that in real life, health conversations typically unfold over multiple turns where the model asks follow-up questions 5

. However, critics point out that the careful debate over how AI should be deployed and evaluated in clinical settings often fails to acknowledge that direct-to-consumer use is already widespread.

AI chatbots miss most medical diagnoses as millions seek health advice from ChatGPT

AI Healthcare Tools Fail Most Medical Diagnoses

ChatGPT Health Underestimates Medical Emergencies

Relying on AI for Diagnoses: The Input Problem

Patient Safety Concerns and Inconsistent Crisis Alerts

The Accessibility Argument and Real-World Context

How Medical Professionals Recommend Using AI

The Confidence Problem and Overreliance Risks

Regulatory Guardrails and the Path Forward

References

AI Chatbots Miss More Than Half of Medical Diagnoses, Study Finds

The good, bad, and ugly of AI healthcare, according to a doctor who uses AI

ChatGPT Health Underestimates Medical Emergencies, Study Finds

ChatGPT might give you bad medical advice, studies warn

The era of Doctor AI is already here

Related Stories

AI chatbots fail to improve medical advice for patients, Oxford study reveals

ChatGPT Health fails to recognize half of medical emergencies in first independent safety test

The Rise of AI in Medical Diagnosis: Promises and Pitfalls

Recent Highlights

OpenAI shuts down Sora video app after six months, ending Disney's $1 billion investment deal

AI-Generated Val Kilmer to Posthumously Appear in As Deep as the Grave After His Death

Supermicro Co-Founder Indicted in $2.5 Billion Nvidia AI Chip Smuggling Scheme to China

Recent Highlights

Today's Top Stories

Trump appoints Zuckerberg, Huang, and Ellison to tech-heavy science council focused on AI

Melania Trump walks White House red carpet with humanoid robot to pitch AI teachers

Google launches Lyria 3 Pro to generate three-minute songs with enhanced creative control

Google's TurboQuant cuts AI memory by 6x, rattles chip stocks as industry rethinks hardware needs