2 Sources
2 Sources
[1]
ChatGPT Health Underestimates Medical Emergencies, Study Finds
A group of researchers at the Icahn School of Medicine at Mount Sinai say they have conducted the first independent safety evaluation of OpenAI's ChatGPT Health assistant since the tool launched in January 2026. "We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?†lead author and urologist Ashwin Ramaswamy said in a press release. It turns out that the answer, most of the time, is no. In a controlled study, the researchers tested how good ChatGPT Health was at assessing the severity of a patient's condition, a process called "triage" in medicine. The researchers found that ChatGPT Health "under-triaged" 52% of emergency cases, "directing patients with diabetic ketoacidosis and impending respiratory failure to 24-48 hour evaluation rather than the emergency department." In the respiratory failure case, the AI clearly identified the symptoms as an early warning sign, but reassured the patient to wait and monitor instead of urging them to seek emergency help. The system did triage more "textbook emergencies" like stroke and anaphylaxis correctly, though. But the researchers say that the nuanced situations that ChatGPT Health failed at are where clinical judgment matters the most. OpenAI launched ChatGPT Health earlier this year, after releasing a report saying that more than 40 million people around the world had been resorting to the company's chatbot daily for health advice. The OpenAI study where that number came from also found that 7-in-10 of those healthcare-related conversations were happening outside of normal clinic hours, and an average of more than 580,000 healthcare inquiries in the U.S. were sent from "hospital deserts," aka places that are more than a 30-minute drive from a general medical or children's hospital. As users increasingly seek out AI for healthcare inquiries, the technology is burrowing deeper into the healthcare industry thanks to a friendly regulatory environment. AI tools can now renew prescriptions in Utah, and FDA Commissioner Marty Makary told Fox Business earlier this year that some devices and software can provide health information without FDA regulation. But that doesn't negate the very real and documented physical and mental health risks that come with an overreliance on AI. OpenAI specifically has been under intense heat for how its chatbots have dealt with mental health episodes in the past, with grieving families suing the company over negligent behavior and insufficient safety guardrails that they say aided suicidal ideation in relatives. In response, OpenAI has said it will take action on the matter, focusing on ensuring safety by issuing parental controls for minors or nudging users to take a break. ChatGPT Health, for example, directs users to professional help in high-risk cases. But the Mount Sinai study found that the suicide-risk alerts "appeared inconsistently." "The system's alerts were inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves. In real life, when someone talks about exactly how they would harm themselves, that's a sign of more immediate and serious danger, not less," Mount Sinai Health System's chief AI officer Girish Nadkarni said. "This was a particularly surprising and concerning finding." An OpenAI spokesperson asserted that ChatGPT should be thought of as a work in progress, with safety updates and improvements still coming, which are meant to enhance the way the chatbot deals with sensitive situations. The study, the spokesperson pointed out, evaluates immediate triage decisions in a controlled setting, whereas in real-world scenarios, users, and even the chatbot itself, often have follow-up questions that can change the risk assessment. They also noted that ChatGPT Health is still offered on a limited basis, and users who do wish to join enter a waiting list.
[2]
Is ChatGPT Health safe? Study finds AI missed half of medical emergencies
We've all been there. Googling symptoms at midnight, convinced that the slight itch in your throat is the beginning of something sinister only for the doctor to look up from their clipboard and tell you it's a common cold. The internet, it turns out, has always had a flair for the dramatic. Also read: GPT-5.3 Instant: 5 new upgrades that are important for you But what if the opposite happened? What if you asked an AI - one built specifically to help you navigate your health - about your symptoms, and it told you not to worry? And what if, this time, there actually was something to worry about? A study published in Nature Medicine suggests ChatGPT Health may be doing exactly that. And here's the kicker, you can't even use it yet. ChatGPT Health is still waitlisted. OpenAI hasn't fully released it to the public, saying that they still need to improve its safety and reliability before wider rollout. Also read: OpenAI vs Microsoft: Can ChatGPT replace GitHub as the coding industry standard? Researchers at Mount Sinai Hospital tested the chatbot across 60 real medical scenarios. In more than half of genuine emergencies - 51.6%, to be exact - the bot told patients to book an appointment within the next day or two. Not call an ambulance, not go to a hospital. Just wait. We're not talking close calls. We're talking respiratory failure, diabetic ketoacidosis - Conditions that kill within hours if left untreated. "Any doctor, and any person who's gone through any degree of training, would say that patient needs to go to the emergency department," lead study author Dr. Ashwin Ramaswamy told NBC News. The inconsistency is what makes it so unsettling. Stroke - with its unmistakable symptoms - was correctly flagged as an emergency every single time. But subtler crises flew under the radar. A patient with a three-day sore throat was urgently told to see a doctor. The bot, as Ramaswamy put it, was "inverted to clinical risk." OpenAI pushed back, arguing the study doesn't reflect how ChatGPT Health is designed to work, which is as an ongoing conversation, not a single query. That may be true. But with over 40 million people already turning to ChatGPT for health advice - on the regular, general-purpose version - the trajectory here is clear. This product is coming. The waitlist won't last forever. As Dr. John Mafi of UCLA Health puts it, "Before you roll something like this out to make life-affecting decisions, you need to rigorously test it." AI healthcare has real promise -- especially for people living far from medical facilities, or those who can't get an appointment for weeks. And right now, researchers are unambiguous: in a real emergency, call a professional for help. Don't ask a chatbot first.
Share
Share
Copy Link
A Mount Sinai study found ChatGPT Health failed to recognize 52% of genuine medical emergencies, telling patients with life-threatening conditions like respiratory failure to wait rather than seek immediate care. The AI tool remains on a waitlist as OpenAI works to improve safety features, even as over 40 million people already use ChatGPT for health advice.
Researchers at the Icahn School of Medicine at Mount Sinai have conducted the first independent safety evaluation of OpenAI's ChatGPT Health assistant since its January 2026 launch, uncovering troubling gaps in the AI's ability to assess medical emergencies
1
. The study, published in Nature Medicine, tested the chatbot across 60 real medical scenarios and found that ChatGPT Health "under-triaged" 52% of emergency cases, directing patients with life-threatening conditions like diabetic ketoacidosis and respiratory failure to schedule appointments within 24-48 hours rather than urging them to visit the emergency department2
.
Source: Digit
"We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?" lead author and urologist Ashwin Ramaswamy explained
1
. The answer, most of the time, was no. In one respiratory failure case, the AI clearly identified the symptoms as an early warning sign but reassured the patient to wait and monitor instead of seeking emergency help1
.While the system correctly handled "textbook emergencies" like stroke and anaphylaxis, it faltered in nuanced situations where clinical judgment matters most. Dr. Ramaswamy told NBC News that conditions requiring immediate attention were missed medical emergencies that "any doctor, and any person who's gone through any degree of training, would say that patient needs to go to the emergency department"
2
. The inconsistency proved particularly unsettling: stroke symptoms were flagged as emergencies every time, yet a patient with a three-day sore throat received more urgent guidance than someone experiencing early respiratory failure2
.
Source: Gizmodo
The triage process, which assesses the severity of a patient's condition, represents a critical first step in medical care. The Mount Sinai study revealed that OpenAI's ChatGPT Health chatbot for health advice was "inverted to clinical risk," providing more reliable alerts for lower-risk scenarios than for genuinely dangerous situations
2
.The study uncovered another alarming issue: suicide-risk alerts appeared inconsistently across the platform. "The system's alerts were inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves," explained Mount Sinai Health System's chief AI officer Girish Nadkarni
1
. "In real life, when someone talks about exactly how they would harm themselves, that's a sign of more immediate and serious danger, not less. This was a particularly surprising and concerning finding."1
These patient safety concerns come as OpenAI faces ongoing scrutiny over how its chatbots handle mental health episodes, with grieving families filing lawsuits alleging negligent behavior and insufficient safety guardrails that aided suicidal ideation in relatives
1
.Related Stories
OpenAI launched ChatGPT Health after reporting that more than 40 million people worldwide were already using the company's chatbot daily for health advice
1
. The company's research found that 7-in-10 healthcare-related conversations occurred outside normal clinic hours, with an average of more than 580,000 healthcare inquiries in the U.S. originating from "hospital deserts" - areas more than a 30-minute drive from general medical or children's hospitals1
.Despite this massive uptake, ChatGPT Health remains on a waitlist with limited access as OpenAI continues development
1
2
. An OpenAI spokesperson characterized the tool as a work in progress, with safety updates and improvements still forthcoming to enhance how the chatbot handles sensitive situations1
. The company argued that the study evaluates immediate triage decisions in a controlled setting, whereas real-world scenarios typically involve follow-up questions that can change risk assessment1
.As users increasingly seek AI healthcare tools for medical guidance, the technology continues expanding into the healthcare industry through a friendly regulatory environment. AI tools can now renew prescriptions in Utah, and FDA Commissioner Marty Makary stated that some devices and software can provide health information without FDA regulation
1
. However, this regulatory flexibility raises questions about oversight as AI systems demonstrate significant gaps in recognizing critical symptoms."Before you roll something like this out to make life-affecting decisions, you need to rigorously test it," Dr. John Mafi of UCLA Health emphasized
2
. While AI healthcare has potential for people living far from medical facilities or facing long appointment wait times, researchers remain unambiguous: in a real emergency, call a professional for help rather than consulting a chatbot first2
. The study underscores that AI systems designed to provide health advice must undergo rigorous testing before deployment, particularly when life-threatening conditions hang in the balance.Summarized by
Navi
24 Feb 2026•Health

20 Jan 2026•Technology

05 Jan 2026•Technology

1
Policy and Regulation

2
Technology

3
Technology
