7 Sources
7 Sources
[1]
AI Chatbots Miss More Than Half of Medical Diagnoses, Study Finds
During the study, 1,298 participants in the UK were asked to use a large language model, such as ChatGPT or Meta's Llama 3, for medical advice. When used in this way, the LLM correctly identified medical conditions in fewer than 34.5% of cases. The study acknowledged that LLMs now achieve scores on medical knowledge benchmarks comparable to passing the US Medical Licensing Exam, and that clinical documents from LLMs "are rated as equivalent to or better than those written by doctors." However, a problem was revealed when the study's participants tried to get the same results by asking the LLM questions but were not successful. This is because users often didn't provide enough information, the study found. It reports that in 16 of 30 sampled interactions, initial messages contained only partial information. "In two cases, LLMs provided initially correct responses but added new and incorrect responses after the users added additional details," the study said, suggesting that conversing more with the chatbots did not improve the probability of receiving a correct medical diagnosis. After the initial diagnosis, the LLMs provided the correct follow-up steps to the person just 44.2% of the time. According to a survey by OpenAI, which owns ChatGPT, 3 in 5 US adults report using AI for health. "They are using AI to get information when they first feel unwell, consulting it to prepare for their visits with their clinicians, and using it to better comprehend patient instructions and recommendations," OpenAI stated. Read more: ChatGPT for Self-Diagnosis: AI Is Changing the Way We Answer Our Own Health Questions And although there's a small disclaimer on ChatGPT's website that reads, "ChatGPT can make mistakes. Check important info," many people do take the chatbot's word for fact. The study serves as a reminder that ChatGPT and similar chatbots should not be relied upon for medical guidance, particularly in serious situations.
[2]
The good, bad, and ugly of AI healthcare, according to a doctor who uses AI
Follow ZDNET: Add us as a preferred source on Google. ZDNET's key takeaways * People are turning to AI for health advice. * It can get lots wrong. * One doctor offers her advice on using AI. You can find health advice anywhere these days, regardless of credibility or medical expertise. This increased information availability has changed how people interact with medical professionals -- or whether they trust them in the first place. This broader access to health-related guidance also arrives amid historically low levels of trust in the healthcare system. A new poll from the Annenberg Public Policy Center finds that public trust in federal agencies like the Centers for Disease Control, the Food and Drug Administration, and the National Institutes of Health decreased by 5-7% over the past year. Whether or not the tech world is capitalizing on this declining trust, it's certainly making medical alternatives more convenient. The reality is that people are turning to this often free, always available, and quick-to-use technology for answers that a doctor or medical professional would once provide. A recent survey found that 63% of respondents find AI-generated health information reliable, according to Annenberg. Also: Oura built a women's health AI using clinical research - how to try it Google, OpenAI, and Anthropic, three of the major AI players, have built health-oriented large language models (LLMs) for healthcare professionals. Rumors are circulating that Apple could be developing its own health AI, and Oura just launched an experimental custom women's health LLM. For Dr. Alexa Mieses Malchuk, the technology has changed how her patients interact with her -- and how this family physician does her job. AI can give users thorough explanations and answers to every health query under the sun. But it can also get lots wrong. In an interview with ZDNET, Mieses Malchuk discussed the usefulness and pitfalls of health AI, and how patients should approach the technology. How she uses AI Mieses Malchuk isn't AI-intolerant. In fact, she uses it to streamline administrative work, such as triaging patient messages and creating anticipatory guidance before a visit. AI companies continue to build more software for doctors and medical professionals. Just last week, Amazon and Google announced their own healthcare software products for scheduling doctors' appointments, clinical documentation, and medical coding. Administrative burdens in medicine have historically been an issue for doctors, who report spending more time completing paperwork than serving patients face-to-face. Also: OpenAI, Anthropic, and Google all have new AI healthcare tools - here's how they work "There are really neat and cool things like that happening all over healthcare that have kind of streamlined the work of a primary care physician," Mieses Malchuk explained. Still, she's aware of the technology's limitations. AI as a springboard For medical nonprofessionals, she recommends using AI as a springboard, not as the end-all, be-all for medical advice. It can be satisfying to immediately receive an answer from one of these chatbots, and sometimes the AI's response can provide a sense of certainty that assuages worries, but she reminds users that these tools cannot diagnose conditions -- and that most patients sifting through these responses aren't medically trained to know wrong from right. AI chatbot users may be omitting important information about their medical situations, leading to a fundamentally different diagnosis or treatment, Mieses Malchuk said. "Their responses are only as good as the questions we ask." "It's not that people without medical training shouldn't have access to AI. They should be partnering with their primary care physician to help sift through what they're finding online." Also: The Apple Watch missed my hypertension - but this blood pressure wearable caught it immediately As these AI health tools have grown in popularity, she's seen patients come to her less willing to share that they've done their own research using these tools -- but more certain about what they believe their diagnosis to be. "Even in medicine, there's not always 100% certainty about anything. On one hand, it's great that we live in this day and age where we have access to information literally at our fingertips, but there are some real downsides to that," she noted. Mieses Malchuk fears AI tools like ChatGPT could give people a false sense of security, telling people they don't have to go to the doctor or get a condition examined. "That could be a missed opportunity to diagnose something early," she said. Among gold-standard emergencies, a recent study in Nature found that ChatGPT undertriaged over half of cases and directed patients to a 24-48-hour evaluation rather than the emergency department. "Our findings reveal missed high-risk emergencies and inconsistent activation of crisis safeguards, raising safety concerns that warrant prospective validation before consumer-scale deployment of artificial intelligence triage systems," the authors write. How AI can help patients Mieses Malchuk recommends using AI health tools for recommendations on general wellness advice. Maybe a patient was recently diagnosed with celiac disease and wants to know which foods they should and shouldn't eat. AI can create a meal plan, generate ideas, and provide helpful recommendations. It's also great for workout planning, and it's quite easy to create a customized workout regimen with the help of an AI tool. Also: Are AI health coach subscriptions a scam? My verdict after testing Fitbit's for a month All in all, it's a great wellness tool for those without medical training. But leave the diagnostics and treatments to the professionals. "Mistrust in the medical system is growing, which is really a travesty. We take this oath to first do no harm, so the idea that these other resources are giving patients this false sense of confidence and making them think they can completely bypass seeing a physician -- it's an unfortunate step point," Mieses Malchuk said.
[3]
ChatGPT Health Underestimates Medical Emergencies, Study Finds
A group of researchers at the Icahn School of Medicine at Mount Sinai say they have conducted the first independent safety evaluation of OpenAI's ChatGPT Health assistant since the tool launched in January 2026. "We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?†lead author and urologist Ashwin Ramaswamy said in a press release. It turns out that the answer, most of the time, is no. In a controlled study, the researchers tested how good ChatGPT Health was at assessing the severity of a patient's condition, a process called "triage" in medicine. The researchers found that ChatGPT Health "under-triaged" 52% of emergency cases, "directing patients with diabetic ketoacidosis and impending respiratory failure to 24-48 hour evaluation rather than the emergency department." In the respiratory failure case, the AI clearly identified the symptoms as an early warning sign, but reassured the patient to wait and monitor instead of urging them to seek emergency help. The system did triage more "textbook emergencies" like stroke and anaphylaxis correctly, though. But the researchers say that the nuanced situations that ChatGPT Health failed at are where clinical judgment matters the most. OpenAI launched ChatGPT Health earlier this year, after releasing a report saying that more than 40 million people around the world had been resorting to the company's chatbot daily for health advice. The OpenAI study where that number came from also found that 7-in-10 of those healthcare-related conversations were happening outside of normal clinic hours, and an average of more than 580,000 healthcare inquiries in the U.S. were sent from "hospital deserts," aka places that are more than a 30-minute drive from a general medical or children's hospital. As users increasingly seek out AI for healthcare inquiries, the technology is burrowing deeper into the healthcare industry thanks to a friendly regulatory environment. AI tools can now renew prescriptions in Utah, and FDA Commissioner Marty Makary told Fox Business earlier this year that some devices and software can provide health information without FDA regulation. But that doesn't negate the very real and documented physical and mental health risks that come with an overreliance on AI. OpenAI specifically has been under intense heat for how its chatbots have dealt with mental health episodes in the past, with grieving families suing the company over negligent behavior and insufficient safety guardrails that they say aided suicidal ideation in relatives. In response, OpenAI has said it will take action on the matter, focusing on ensuring safety by issuing parental controls for minors or nudging users to take a break. ChatGPT Health, for example, directs users to professional help in high-risk cases. But the Mount Sinai study found that the suicide-risk alerts "appeared inconsistently." "The system's alerts were inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves. In real life, when someone talks about exactly how they would harm themselves, that's a sign of more immediate and serious danger, not less," Mount Sinai Health System's chief AI officer Girish Nadkarni said. "This was a particularly surprising and concerning finding." An OpenAI spokesperson asserted that ChatGPT should be thought of as a work in progress, with safety updates and improvements still coming, which are meant to enhance the way the chatbot deals with sensitive situations. The study, the spokesperson pointed out, evaluates immediate triage decisions in a controlled setting, whereas in real-world scenarios, users, and even the chatbot itself, often have follow-up questions that can change the risk assessment. They also noted that ChatGPT Health is still offered on a limited basis, and users who do wish to join enter a waiting list.
[4]
ChatGPT might give you bad medical advice, studies warn
As tech companies roll out platforms specifically designed for health care consultation, AI is rapidly becoming a key player in many people's medical decisions. According to OpenAI, the maker of ChatGPT, more than 40 million people consult the platform every day for health information. But new research suggests AI may mislead users in certain medical scenarios. One risk: While AI puts vast medical knowledge at your fingertips, many laypeople don't know how to harness it effectively. In a study published recently in the journal Nature Medicine, researchers tried to simulate how people use AI chatbots by giving participants medical scenarios and asking them to consult AI tools. After conversing with the bots, participants correctly identified the hypothetical condition only about a third of the time. Only 43% made the correct decision about next steps, such as whether to go to the emergency room or stay home. "People don't know what they are supposed to be telling the model," says Andrew Bean, who studies AI systems at Oxford University and was one of the authors on this study. Bean says often when using AI, arriving at a helpful conclusion comes down to word choice. "Doctors are trained to ask you questions about symptoms you might not have realized you should have mentioned," says Bean. In one scenario, two different users gave slightly different depictions of the same scenario. One of them described "the worst headache I've ever had," and was directed by the AI to go to the emergency room immediately. The other - who did not use that explicit description - was told to take aspirin and stay home. "Turns out this was actually a life-threatening condition," says Bean. There are some instances when AI excels at identifying medical issues -- in some studies, large language models have sometimes matched or even outperformed physicians on diagnostic reasoning tasks. But the way people use AI Chatbots, says Bean, is far more messy than the controlled, clinical situations in which it performs well. Correct diagnosis, wrong advice Even in circumstances where AI is able to correctly identify the condition, it often does not present the next steps with the appropriate amount of urgency, according to another study. Researchers presented the AI bots with different medical scenarios. In 52%of emergency cases, the bots "under-triaged," meaning treated the ailment as less serious than it was. In one example, it failed to direct a hypothetical patient with diabetic ketoacidosis and impending respiratory failure -- a life-threatening condition -- to go to the emergency department. "When there was a textbook medical emergency, ChatGPT got it right," said Girish Nadkarni, a doctor and AI researcher at Mount Sinai who is an author on the study. The problem, said Nadkarni, is when there were more complicated scenarios in which there was an "element of time" at play - the bot often both over- and under- estimated the amount of time a patient could wait until pursuing care. A spokesperson from OpenAI said this study did not represent the way people actually use ChatGPT, and that the previous study used an older version of ChatGPT that the company argues has since been corrected for some of the concerns that surfaced. Finding the utility of AI in medicine Despite concerns about inaccuracy, doctors who study AI believe there is value in patients using it for health care information, and point to times it has even provided lifesaving advice. "I encourage patients to use these tools," says Robert Wachter, a doctor at UC San Francisco and author of the recently published book, A Giant Leap: How AI Is Transforming Health Care and What That Means for Our Future. Wachter argues that with health care difficult to afford and access, consulting AI is still often better than the alternatives. "The advice you get from the tools is substantially better than nothing and better than what you would get from your second cousin," says Wachter. Still, Wachter stresses, AI is not a replacement for a doctor. Adam Rodman, a hospitalist who researches AI programs at Harvard Medical School, discourages people from using AI to triage emergency situations, but says AI can add significant value to a patient's interaction with a human medical practitioner. "A good time to use a large language model is when you're about to go see a doctor -- or after you see your doctor," says Rodman. It can help you become more informed about your condition in advance of an appointment and use time with your providers efficiently, he says, giving patients the opportunity to partner with their doctor on decisions rather than engage in lengthy question and answer sessions. "There are no downsides to better understanding your health," says Rodman. AI in health care is here to stay Doctors interviewed for this story acknowledge that AI and medicine are already inextricably entangled and imagine that both AI and humans will become more skilled at engaging with each other. " My hope is that you might see AI as an extension of a human relationship," says Rodman. He imagines a future where both doctors and humans partner with AI in order to facilitate communication and overcome medical bureaucracy. Rodman says there is a risk in AI. He fears a time when humans would be informed of scary diagnoses -- such as cancer -- by a bot, rather than a human. Studies show that when health care is treated more like a business or marketplace product, people trust doctors less. "What I hope is that this technology can be used in a way that enhances humanity in medicine," says Rodman "and not in a way that cuts out the doctor-patient relationship."
[5]
The era of Doctor AI is already here
Why it matters: This is opening access to medical information in an entirely new way. The problem is, that advice may not always be very good. Where it stands: OpenAI put out some numbers in January: More than 40 million people ask its ChatGPT health care-related questions every day, and 1 in 4 of the tool's approximately 800 million regular users submits a health care prompt every week. * The careful debate over how AI should be deployed, regulated and evaluated in clinical settings often fails to acknowledge that the cat's already out of the bag when it comes to direct-to-consumer use. * "Too often people are using this as an expert and not as an assistant," American Medical Association CEO John Whyte told Axios in an interview. Between the lines: Everyone pretty much agrees that you shouldn't replace your doctor with AI, at least not yet. But a more realistic question is how helpful it is when your doctor isn't available -- or if you don't have one. * "We've made accessibility to medical information and medical judgment so hard in this country, and ChatGPT makes it so easy," said Ashish Jha, the former White House COVID response coordinator under President Biden and former dean of the Brown University School of Public Health. * "The idea that these tools have to be as good as a physician is absurd given how much more convenient they are." * "I think there's a risk of bad things happening. ... Is it dangerous? I think the status quo is dangerous," said Bob Wachter, chair of the Department of Medicine at UCSF and the author of "A Giant Leap: How AI is Transforming Healthcare and What That Means for Our Future." * "The question is without it, what would you have done?" Wachter added. Driving the news: A recent study published in Nature found that ChatGPT under-triaged about half of health care emergencies in a test performed by researchers. * Karan Singhal, who leads the company's health AI team, said its latest GPT-5 models correctly refer emergency cases nearly 99% of the time. In real life, she said, health conversations in ChatGPT typically unfold over multiple turns, where the model asks follow-up questions and gathers more context before responding. What we're watching: What new state and federal guardrails are put around AI in health care. * "We don't regulate the availability of information in the United States," said David Blumenthal, former president of the Commonwealth Fund, but "it's possible that rating agencies may arise that will address the reliability of different chatbots for different functions." Some takeaways from my conversations with experts: 1. AI seems to be better at some things than others. Chatbots can be good at explaining lab results or coming up with a list of questions to ask your doctor ahead of a visit, Whyte said. * That doesn't mean people are actually using it for what it's good at. * Jha, who said that large language models aren't yet "ready for prime time" when it comes to diagnosing illness, still thinks people will use it for clues to what ails them "because they've been using Google for diagnosis and this is so much better than Google." * Ultimately, "I don't think we have a super clear understanding of what it's good for and what it's not," Jha said. 2. Output is super dependent on input. And your average person may not know the correct inputs. * "The way a patient's question can be phrased can lead to variability in how an LLM responds," said Duke University's Monica Agrawal. * "If they have incomplete context or they share a subjective impression or they have a misconception when they're seeking advice, LLMs have an ability more so than a doctor to reinforce those misconceptions." 3. The way it says things can be problematic. "I worry some of these LLMs speak with a level of confidence that is really unjustified," Jha said. * It is also problematic that models generally are built to tell people what they want to hear, Agrawal said. "In the places where a doctor might push back ... we're not seeing necessarily the same behavior in models." * "If you say, 'I have a headache,' I don't say, 'Oh I think you have a migraine' -- I would say, 'Tell me more about it,'" Wachter said. "The tools don't naturally do that, and I think the consumer-facing tools of the future will." 4. Most people using AI don't have the expertise to spot mistakes. There's a divide between "professional use of these tools and the laypeople use of these tools," Wachter said. * Whereas they can be extremely helpful to doctors, your average patient probably doesn't have the medical knowledge to identify when a response doesn't apply or seems off. What we're watching: Today's models are constantly being re-trained -- and generally improved.
[6]
The dangers of asking ChatGPT your health questions
ChatGPT Health struggles to recognise when users need urgent care, according to a new study. More than 230 million people a week ask ChatGPT for medical advice - from checking whether food is safe to eat, to managing allergies, or finding remedies to shake off a cold, according to OpenAI. Despite performing well for textbook cases, ChatGPT Health failed to advise emergency care in serious cases, according to a new study published in Nature. The study found that while the tool generally handled clear-cut emergencies correctly, it underestimated more than half of the cases that required emergency care. "We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?" said Ashwin Ramaswamy, lead author of the study at Mount Sinai in New York. "ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions," he said. He added that the language model struggled in situations where the danger is not immediately obvious. In one asthma scenario, the system identified early warning signs of respiratory failure in its explanation but still advised waiting rather than seeking emergency treatment, he noted. The research team created 60 structured clinical scenarios across 21 medical specialties with cases ranging from minor conditions appropriate for home care to true medical emergencies. Three independent physicians determined the correct level of urgency for each case using guidelines from 56 medical societies. ChatGPT Health was launched by OpenAI in January 2026, allowing users to connect their health information - such as medical records and data from wellness apps like MyFitnessPal - to receive more personalised and contextual responses. The study also examined how the model responded to users reporting self-harm intentions and found similar results. ChatGPT Health is supposed to be programmed so that when someone mentions self-harm or suicidal thoughts, it directly encourages them to seek help and call a public health number. The banner "Help is available," linking to the suicide and crisis lifeline, appeared inconsistently during the study. The authors noted that the guardrail answered more reliably for the patient who had not identified a means of self-harm than for those who had. "The pattern was not merely inconsistent but paradoxically inverted relative to clinical severity," the study found. Despite the findings, the researchers did not suggest consumers should abandon AI health tools altogether. "As a medical student training at a time when AI health tools are already in the hands of millions, I see them as technologies we must learn to integrate thoughtfully into care rather than substitutes for clinical judgment," said Alvira Tyagi, second author of the study. The study authors advised that people experiencing worsening or concerning symptoms, including chest pain, shortness of breath, severe allergic reactions, or changes in mental status, should seek medical care directly rather than relying solely on chatbot guidance. The study also noted that AI language models are constantly evolving and frequently updated, meaning performance can change over time. "Starting medical training alongside tools that are evolving in real time makes it clear that today's results are not set in stone," said Tyagi. She added that the rapidly changing reality calls for ongoing review to ensure that technology improvements translate into safer care.
[7]
Is ChatGPT Health safe? Study finds AI missed half of medical emergencies
We've all been there. Googling symptoms at midnight, convinced that the slight itch in your throat is the beginning of something sinister only for the doctor to look up from their clipboard and tell you it's a common cold. The internet, it turns out, has always had a flair for the dramatic. Also read: GPT-5.3 Instant: 5 new upgrades that are important for you But what if the opposite happened? What if you asked an AI - one built specifically to help you navigate your health - about your symptoms, and it told you not to worry? And what if, this time, there actually was something to worry about? A study published in Nature Medicine suggests ChatGPT Health may be doing exactly that. And here's the kicker, you can't even use it yet. ChatGPT Health is still waitlisted. OpenAI hasn't fully released it to the public, saying that they still need to improve its safety and reliability before wider rollout. Also read: OpenAI vs Microsoft: Can ChatGPT replace GitHub as the coding industry standard? Researchers at Mount Sinai Hospital tested the chatbot across 60 real medical scenarios. In more than half of genuine emergencies - 51.6%, to be exact - the bot told patients to book an appointment within the next day or two. Not call an ambulance, not go to a hospital. Just wait. We're not talking close calls. We're talking respiratory failure, diabetic ketoacidosis - Conditions that kill within hours if left untreated. "Any doctor, and any person who's gone through any degree of training, would say that patient needs to go to the emergency department," lead study author Dr. Ashwin Ramaswamy told NBC News. The inconsistency is what makes it so unsettling. Stroke - with its unmistakable symptoms - was correctly flagged as an emergency every single time. But subtler crises flew under the radar. A patient with a three-day sore throat was urgently told to see a doctor. The bot, as Ramaswamy put it, was "inverted to clinical risk." OpenAI pushed back, arguing the study doesn't reflect how ChatGPT Health is designed to work, which is as an ongoing conversation, not a single query. That may be true. But with over 40 million people already turning to ChatGPT for health advice - on the regular, general-purpose version - the trajectory here is clear. This product is coming. The waitlist won't last forever. As Dr. John Mafi of UCLA Health puts it, "Before you roll something like this out to make life-affecting decisions, you need to rigorously test it." AI healthcare has real promise -- especially for people living far from medical facilities, or those who can't get an appointment for weeks. And right now, researchers are unambiguous: in a real emergency, call a professional for help. Don't ask a chatbot first.
Share
Share
Copy Link
More than 40 million people consult ChatGPT daily for health information, but new research reveals AI chatbots correctly identify medical conditions only 34.5% of the time. Studies show these tools undertriage 52% of emergency cases and provide correct follow-up steps just 44.2% of the time, raising urgent questions about patient safety as AI healthcare becomes mainstream.
A groundbreaking study published in Nature Medicine reveals that AI chatbots for medical advice are missing the mark far more often than users might expect. When 1,298 participants in the UK used large language models (LLM) like ChatGPT or Meta's Llama 3 for medical guidance, the tools correctly identified medical conditions in fewer than 34.5% of cases
1
. This comes at a critical time when OpenAI reports that more than 40 million people consult its platform daily for health-related questions, with 1 in 4 of ChatGPT's approximately 800 million regular users submitting a health care prompt every week5
.
Source: Digit
The study's findings challenge the growing confidence in AI for health information. While large language models now achieve scores on medical knowledge benchmarks comparable to passing the US Medical Licensing Exam, and clinical documents from LLMs are rated as equivalent to or better than those written by doctors, real-world application tells a different story
1
. After the initial diagnosis, the AI tools provided correct follow-up steps just 44.2% of the time, highlighting a dangerous gap between theoretical capability and practical performance.Researchers at the Icahn School of Medicine at Mount Sinai conducted the first independent safety evaluation of ChatGPT Health since its January 2026 launch, focusing on triage—the critical process of assessing a patient's condition severity. The results were alarming: ChatGPT Health undertriage 52% of emergency cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24-48 hour evaluation rather than the emergency department
3
. A separate Nature study confirmed that among gold-standard emergencies, ChatGPT undertriaged over half of cases2
.Lead author and urologist Ashwin Ramaswamy explained the study's core question: "if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?" The answer, most of the time, is no
3
. In one respiratory failure case, the AI clearly identified symptoms as early warning signs but reassured the patient to wait and monitor instead of urging emergency help. The system did correctly triage textbook emergencies like stroke and anaphylaxis, but failed at nuanced situations where clinical judgment matters most.The inaccurate medical advice often stems from how users interact with these systems. Andrew Bean, who studies AI systems at Oxford University, notes that "people don't know what they are supposed to be telling the model"
4
. The study found that in 16 of 30 sampled interactions, initial messages contained only partial information about symptoms1
.
Source: CNET
Word choice proves critical. In one scenario, two users described the same condition differently. One mentioned "the worst headache I've ever had" and was directed to the emergency room immediately. The other, who didn't use that explicit description, was told to take aspirin and stay home—despite the condition being life-threatening
4
. Duke University's Monica Agrawal warns that "if they have incomplete context or they share a subjective impression or they have a misconception when they're seeking advice, LLMs have an ability more so than a doctor to reinforce those misconceptions"5
.Paradoxically, the study revealed that in two cases, LLMs provided initially correct responses but added new and incorrect responses after users added additional details, suggesting that conversing more with the chatbots did not improve the probability of receiving accurate guidance
1
.The Mount Sinai study uncovered particularly concerning findings about suicide-risk alerts, which "appeared inconsistently" and were "inverted relative to clinical risk, appearing more reliably for lower-risk scenarios than for cases when someone shared how they intended to hurt themselves"
3
. Mount Sinai Health System's chief AI officer Girish Nadkarni called this "a particularly surprising and concerning finding," noting that in real life, when someone talks about exactly how they would harm themselves, that's a sign of more immediate and serious danger, not less.Dr. Alexa Mieses Malchuk, a family physician who uses AI to streamline administrative work, fears these tools could give people a false sense of security, telling them they don't have to go to the doctor or get a condition examined
2
. "That could be a missed opportunity to diagnose something early," she warned. She's also noticed patients coming to her less willing to share that they've done their own research using these tools but more certain about what they believe their diagnosis to be.Despite the missed medical diagnoses and AI diagnostic accuracy concerns, some medical professionals argue these tools still provide value in a healthcare system plagued by accessibility issues. According to OpenAI's research, 7-in-10 healthcare-related conversations happen outside normal clinic hours, and an average of more than 580,000 healthcare inquiries in the U.S. come from "hospital deserts"—places more than a 30-minute drive from a general medical or children's hospital
3
."We've made accessibility to medical information and medical judgment so hard in this country, and ChatGPT makes it so easy," said Ashish Jha, former White House COVID response coordinator under President Biden
5
. Robert Wachter, chair of the Department of Medicine at UCSF, argues that with healthcare difficult to afford and access, "the advice you get from the tools is substantially better than nothing and better than what you would get from your second cousin"4
.Related Stories
Adam Rodman, a hospitalist who researches AI programs at Harvard Medical School, discourages people from using AI to triage emergency situations but sees value when used appropriately. "A good time to use a large language model is when you're about to go see a doctor—or after you see your doctor," Rodman advises
4
. This approach allows for patient education and more efficient use of time with medical professionals.Mieses Malchuk recommends using AI as a springboard, not as the end-all for self-diagnosis. She notes that AI chatbot users may be omitting important information about their medical situations, leading to fundamentally different diagnosis or treatment recommendations. "Their responses are only as good as the questions we ask," she explains
2
. American Medical Association CEO John Whyte emphasizes that "too often people are using this as an expert and not as an assistant"5
.One significant issue is how AI presents information. "I worry some of these LLMs speak with a level of confidence that is really unjustified," Jha noted
5
. Models are generally built to tell people what they want to hear, and "in the places where a doctor might push back... we're not seeing necessarily the same behavior in models," Agrawal added. Most people using AI don't have the medical expertise to spot mistakes, creating a divide between professional use and laypeople use of these tools.A survey by OpenAI found that 3 in 5 US adults report using AI for health, consulting it when they first feel unwell, to prepare for clinician visits, and to better comprehend patient instructions
1
. Yet despite ChatGPT's small disclaimer that "ChatGPT can make mistakes. Check important info," many people take the chatbot's word as fact. Recent polling from the Annenberg Public Policy Center found that 63% of respondents find AI-generated health information reliable, even as public trust in federal agencies like the CDC, FDA, and NIH decreased by 5-7% over the past year2
.The technology is burrowing deeper into the healthcare industry thanks to a friendly regulatory environment. AI tools can now renew prescriptions in Utah, and FDA Commissioner Marty Makary told Fox Business that some devices and software can provide health information without FDA regulation
3
. However, experts are watching what new state and federal regulatory guardrails will be put around AI in healthcare. David Blumenthal, former president of the Commonwealth Fund, suggests that "it's possible that rating agencies may arise that will address the reliability of different chatbots for different functions"5
.
Source: Axios
An OpenAI spokesperson responded to the studies by asserting that ChatGPT should be thought of as a work in progress, with safety updates and improvements still coming to enhance how the chatbot deals with sensitive situations
3
. Karan Singhal, who leads OpenAI's health AI team, claimed its latest GPT-5 models correctly refer emergency cases nearly 99% of the time, noting that in real life, health conversations typically unfold over multiple turns where the model asks follow-up questions5
. However, critics point out that the careful debate over how AI should be deployed and evaluated in clinical settings often fails to acknowledge that direct-to-consumer use is already widespread.Summarized by
Navi
[5]
1
Technology

2
Entertainment and Society

3
Policy and Regulation
