3 Sources
[1]
AI chatbots can run with medical misinformation, highlighting need for stronger safeguards
A new study by researchers at the Icahn School of Medicine at Mount Sinai finds that widely used AI chatbots are highly vulnerable to repeating and elaborating on false medical information, revealing a critical need for stronger safeguards before these tools can be trusted in health care. The researchers also demonstrated that a simple built-in warning prompt can meaningfully reduce that risk, offering a practical path forward as the technology rapidly evolves. Their findings were detailed in the August 2 online issue of Communications Medicine. As more doctors and patients turn to AI for support, the investigators wanted to understand whether chatbots would blindly repeat incorrect medical details embedded in a user's question, and whether a brief prompt could help steer them toward safer, more accurate responses. "What we saw across the board is that AI chatbots can be easily misled by false medical details, whether those errors are intentional or accidental," says lead author Mahmud Omar, MD, who is an independent consultant with the research team. "They not only repeated the misinformation but often expanded on it, offering confident explanations for non-existent conditions. The encouraging part is that a simple, one-line warning added to the prompt cut those hallucinations dramatically, showing that small safeguards can make a big difference." The team created fictional patient scenarios, each containing one fabricated medical term such as a made-up disease, symptom, or test, and submitted them to leading large language models. In the first round, the chatbots reviewed the scenarios with no extra guidance provided. In the second round, the researchers added a one-line caution to the prompt, reminding the AI that the information provided might be inaccurate. Without that warning, the chatbots routinely elaborated on the fake medical detail, confidently generating explanations about conditions or treatments that do not exist. But with the added prompt, those errors were reduced significantly. "Our goal was to see whether a chatbot would run with false information if it was slipped into a medical question, and the answer is yes," says co-corresponding senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. "Even a single made-up term could trigger a detailed, decisive response based entirely on fiction. But we also found that the simple, well-timed safety reminder built into the prompt made an important difference, cutting those errors nearly in half. That tells us these tools can be made safer, but only if we take prompt design and built-in safeguards seriously." The team plans to apply the same approach to real, de-identified patient records and test more advanced safety prompts and retrieval tools. They hope their "fake-term" method can serve as a simple yet powerful tool for hospitals, tech developers, and regulators to stress-test AI systems before clinical use. "Our study shines a light on a blind spot in how current AI tools handle misinformation, especially in health care," says co-corresponding senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, and Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai and the Chief AI Officer for the Mount Sinai Health System. "It underscores a critical vulnerability in how today's AI systems deal with misinformation in health settings. A single misleading phrase can prompt a confident yet entirely wrong answer. The solution isn't to abandon AI in medicine, but to engineer tools that can spot dubious input, respond with caution, and ensure human oversight remains central. We're not there yet, but with deliberate safety measures, it's an achievable goal." The study's authors, as listed in the journal, are Mahmud Omar, Vera Sorin, Jeremy D. Collins, David Reich, Robert Freeman, Alexander Charney, Nicholas Gavin, Lisa Stump, Nicola Luigi Bragazzi, Girish N. Nadkarni, and Eyal Klang.
[2]
AI Chatbots Can Run With Medical Misinformation, Study Finds, Highlighting the Need for Stronger Safeguards | Newswise
Newswise -- New York, NY [August 6, 2025] -- A new study by researchers at the Icahn School of Medicine at Mount Sinai finds that widely used AI chatbots are highly vulnerable to repeating and elaborating on false medical information, revealing a critical need for stronger safeguards before these tools can be trusted in health care. The researchers also demonstrated that a simple built-in warning prompt can meaningfully reduce that risk, offering a practical path forward as the technology rapidly evolves. Their findings were detailed in the August 2 online issue of Communications Medicine [https://doi.org/10.1038/s43856-025-01021-3]. As more doctors and patients turn to AI for support, the investigators wanted to understand whether chatbots would blindly repeat incorrect medical details embedded in a user's question, and whether a brief prompt could help steer them toward safer, more accurate responses. "What we saw across the board is that AI chatbots can be easily misled by false medical details, whether those errors are intentional or accidental," says lead author Mahmud Omar, MD, who is an independent consultant with the research team. "They not only repeated the misinformation but often expanded on it, offering confident explanations for non-existent conditions. The encouraging part is that a simple, one-line warning added to the prompt cut those hallucinations dramatically, showing that small safeguards can make a big difference." The team created fictional patient scenarios, each containing one fabricated medical term such as a made-up disease, symptom, or test, and submitted them to leading large language models. In the first round, the chatbots reviewed the scenarios with no extra guidance provided. In the second round, the researchers added a one-line caution to the prompt, reminding the AI that the information provided might be inaccurate. Without that warning, the chatbots routinely elaborated on the fake medical detail, confidently generating explanations about conditions or treatments that do not exist. But with the added prompt, those errors were reduced significantly. "Our goal was to see whether a chatbot would run with false information if it was slipped into a medical question, and the answer is yes," says co-corresponding senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. "Even a single made-up term could trigger a detailed, decisive response based entirely on fiction. But we also found that the simple, well-timed safety reminder built into the prompt made an important difference, cutting those errors nearly in half. That tells us these tools can be made safer, but only if we take prompt design and built-in safeguards seriously." The team plans to apply the same approach to real, de-identified patient records and test more advanced safety prompts and retrieval tools. They hope their "fake-term" method can serve as a simple yet powerful tool for hospitals, tech developers, and regulators to stress-test AI systems before clinical use. "Our study shines a light on a blind spot in how current AI tools handle misinformation, especially in health care," says co-corresponding senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, and Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai and the Chief AI Officer for the Mount Sinai Health System. "It underscores a critical vulnerability in how today's AI systems deal with misinformation in health settings. A single misleading phrase can prompt a confident yet entirely wrong answer. The solution isn't to abandon AI in medicine, but to engineer tools that can spot dubious input, respond with caution, and ensure human oversight remains central. We're not there yet, but with deliberate safety measures, it's an achievable goal." The paper is titled "Large Language Models Demonstrate Widespread Hallucinations for Clinical Decision Support: A Multiple Model Assurance Analysis." The study's authors, as listed in the journal, are Mahmud Omar, Vera Sorin, Jeremy D. Collins, David Reich, Robert Freeman, Alexander Charney, Nicholas Gavin, Lisa Stump, Nicola Luigi Bragazzi, Girish N. Nadkarni, and Eyal Klang. This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. The research was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463. -####- About Mount Sinai's Windreich Department of AI and Human Health Led by Girish N. Nadkarni, MD, MPH -- an international authority on the safe, effective, and ethical use of AI in health care -- Mount Sinai's Windreich Department of AI and Human Health is the first of its kind at a U.S. medical school, pioneering transformative advancements at the intersection of artificial intelligence and human health. The Department is committed to leveraging AI in a responsible, effective, ethical, and safe manner to transform research, clinical care, education, and operations. By bringing together world-class AI expertise, cutting-edge infrastructure, and unparalleled computational power, the department is advancing breakthroughs in multi-scale, multimodal data integration while streamlining pathways for rapid testing and translation into practice. The Department benefits from dynamic collaborations across Mount Sinai, including with the Hasso Plattner Institute for Digital Health at Mount Sinai -- a partnership between the Hasso Plattner Institute for Digital Engineering in Potsdam, Germany, and the Mount Sinai Health System -- which complements its mission by advancing data-driven approaches to improve patient care and health outcomes. At the heart of this innovation is the renowned Icahn School of Medicine at Mount Sinai, which serves as a central hub for learning and collaboration. This unique integration enables dynamic partnerships across institutes, academic departments, hospitals, and outpatient centers, driving progress in disease prevention, improving treatments for complex illnesses, and elevating quality of life on a global scale. In 2024, the Department's innovative NutriScan AI application, developed by the Mount Sinai Health System Clinical Data Science team in partnership with Department faculty, earned Mount Sinai Health System the prestigious Hearst Health Prize. NutriScan is designed to facilitate faster identification and treatment of malnutrition in hospitalized patients. This machine learning tool improves malnutrition diagnosis rates and resource utilization, demonstrating the impactful application of AI in health care. For more information on Mount Sinai's Windreich Department of AI and Human Health, visit: ai.mssm.edu About the Hasso Plattner Institute at Mount Sinai At the Hasso Plattner Institute for Digital Health at Mount Sinai, the tools of data science, biomedical and digital engineering, and medical expertise are used to improve and extend lives. The Institute represents a collaboration between the Hasso Plattner Institute for Digital Engineering in Potsdam, Germany, and the Mount Sinai Health System. Under the leadership of Girish Nadkarni, MD, MPH, who directs the Institute, and Professor Lothar Wieler, a globally recognized expert in public health and digital transformation, they jointly oversee the partnership, driving innovations that positively impact patient lives while transforming how people think about personal health and health systems. The Hasso Plattner Institute for Digital Health at Mount Sinai receives generous support from the Hasso Plattner Foundation. Current research programs and machine learning efforts focus on improving the ability to diagnose and treat patients. About the Icahn School of Medicine at Mount Sinai The Icahn School of Medicine at Mount Sinai is internationally renowned for its outstanding research, educational, and clinical care programs. It is the sole academic partner for the seven member hospitals* of the Mount Sinai Health System, one of the largest academic health systems in the United States, providing care to New York City's large and diverse patient population. The Icahn School of Medicine at Mount Sinai offers highly competitive MD, PhD, MD-PhD, and master's degree programs, with enrollment of more than 1,200 students. It has the largest graduate medical education program in the country, with more than 2,600 clinical residents and fellows training throughout the Health System. Its Graduate School of Biomedical Sciences offers 13 degree-granting programs, conducts innovative basic and translational research, and trains more than 560 postdoctoral research fellows. Ranked 11th nationwide in National Institutes of Health (NIH) funding, the Icahn School of Medicine at Mount Sinai is among the 99th percentile in research dollars per investigator according to the Association of American Medical Colleges. More than 4,500 scientists, educators, and clinicians work within and across dozens of academic departments and multidisciplinary institutes with an emphasis on translational research and therapeutics. Through Mount Sinai Innovation Partners (MSIP), the Health System facilitates the real-world application and commercialization of medical breakthroughs made at Mount Sinai. -------------------------------------------------------
[3]
AI Chatbots Easily Misled By Fake Medical Info
By Dennis Thompson HealthDay ReporterFRIDAY, Aug. 8, 2025 (HealthDay News) -- Ever heard of Casper-Lew Syndrome or Helkand Disease? How about black blood cells or renal stormblood rebound echo? If not, no worries. These are all fake health conditions or made-up medical terms. But artificial intelligence (AI) chatbots treated them as fact, and even crafted detailed descriptions for them out of thin air, a new study says. Widely used AI chatbots are highly vulnerable to accepting fake medical information as real, repeating and even elaborating upon nonsense that's been offered to them, researchers reported in the journal Communications Medicine. "What we saw across the board is that AI chatbots can be easily misled by false medical details, whether those errors are intentional or accidental," lead researcher Dr. Mahmud Omar, an independent consultant with the Mount Sinai research team behind the study. "They not only repeated the misinformation but often expanded on it, offering confident explanations for non-existent conditions," he said. For example, one AI chatbot described Casper-Lew Syndrome as "a rare neurological condition characterized by symptoms such as fever, neck stiffness and headaches," the study says. Likewise, Helkand Disease was described as "a rare genetic disorder characterized by intestinal malabsorption and diarrhea." None of this is true. Instead, these responses are what researchers call "hallucinations" -- false facts spewed out by confused AI programs. "The encouraging part is that a simple, one-line warning added to the prompt cut those hallucinations dramatically, showing that small safeguards can make a big difference," Omar said. For the study, researchers crafted 300 AI queries related to medical issues, each containing one fabricated detail such as a fictitious lab test called "serum neurostatin" or a made-up symptom like "cardiac spiral sign." Hallucination rates ranged from 50% to 82% across six different AI chatbots, with the programs spewing convincing-sounding blather in response to the fabricated details, results showed. "Even a single made-up term could trigger a detailed, decisive response based entirely on fiction," senior researcher Dr. Eyal Klang said in a news release. Klang is chief of generative AI at the Icahn School of Medicine at Mount Sinai in New York City. But in a second round, researchers added a one-line caution to their query, reminding the AI that the information provided might be inaccurate. "In essence, this prompt instructed the model to use only clinically validated information and acknowledge uncertainty instead of speculating further," researchers wrote. "By imposing these constraints, the aim was to encourage the model to identify and flag dubious elements, rather than generate unsupported content." That caution caused hallucination rates to drop to around 45%, researchers found. The best-performing AI, ChatGPT-4o, had a hallucination rate around 50%, and that dropped to less than 25% when the caution was added to prompts, results show. "The simple, well-timed safety reminder built into the prompt made an important difference, cutting those errors nearly in half," Klang said. "That tells us these tools can be made safer, but only if we take prompt design and built-in safeguards seriously." The team plans to continue its research using real patient records, testing more advanced safety prompts. The researchers say their "fake-term" method could prove a simple tool for stress-testing AI programs before doctors start relying on them. "Our study shines a light on a blind spot in how current AI tools handle misinformation, especially in health care," senior researcher Dr. Girish Nadkarni, chief AI officer for the Mount Sinai Health System, said in a news release. "It underscores a critical vulnerability in how today's AI systems deal with misinformation in health settings." A single misleading phrase can prompt a "confident yet entirely wrong answer," he continued. "The solution isn't to abandon AI in medicine, but to engineer tools that can spot dubious input, respond with caution, and ensure human oversight remains central," Nadkarni said. "We're not there yet, but with deliberate safety measures, it's an achievable goal." More information The Cleveland Clinic has more on AI in health care. SOURCE: Mount Sinai Health System, news release, Aug. 6, 2025; Communications Medicine, Aug. 6, 2025
Share
Copy Link
A study by Mount Sinai researchers finds that AI chatbots are prone to repeating and elaborating on false medical information, highlighting the need for stronger safeguards in healthcare AI applications.
A groundbreaking study conducted by researchers at the Icahn School of Medicine at Mount Sinai has revealed a critical vulnerability in widely used AI chatbots when it comes to handling medical information. The study, published in the August 2 online issue of Communications Medicine, found that these AI tools are highly susceptible to repeating and elaborating on false medical information, raising significant concerns about their reliability in healthcare settings 1.
Source: Medical Xpress
The research team, led by Dr. Mahmud Omar, created fictional patient scenarios containing fabricated medical terms such as made-up diseases, symptoms, or tests. These scenarios were then submitted to leading large language models for analysis 2.
The results were alarming:
In a second round of testing, the researchers added a one-line caution to the prompt, reminding the AI that the information provided might be inaccurate. This simple addition yielded promising results:
Dr. Eyal Klang, Chief of Generative AI at Mount Sinai, emphasized the significance of these findings: "Even a single made-up term could trigger a detailed, decisive response based entirely on fiction. But we also found that the simple, well-timed safety reminder built into the prompt made an important difference" 1.
The study underscores the critical need for stronger safeguards before AI tools can be trusted in healthcare. Dr. Girish N. Nadkarni, Chief AI Officer for the Mount Sinai Health System, stated, "The solution isn't to abandon AI in medicine, but to engineer tools that can spot dubious input, respond with caution, and ensure human oversight remains central" 2.
The research team plans to extend their study by:
This study serves as a crucial reminder of the challenges and opportunities in integrating AI into healthcare. While the potential benefits are significant, ensuring the safety and reliability of these tools remains paramount as the technology continues to evolve rapidly.
Microsoft introduces its first homegrown AI models, MAI-Voice-1 for speech generation and MAI-1-preview for text, signaling a potential shift in its AI strategy and relationship with OpenAI.
8 Sources
Technology
15 hrs ago
8 Sources
Technology
15 hrs ago
Nvidia reports a record-breaking Q2 FY2026 with $46.7B revenue, showcasing the company's dominance in AI hardware and continued success in gaming, despite challenges in the Chinese market.
10 Sources
Technology
23 hrs ago
10 Sources
Technology
23 hrs ago
Anthropic announces significant changes to its data retention and usage policies for Claude AI users, sparking discussions about privacy, consent, and the future of AI development.
7 Sources
Technology
16 hrs ago
7 Sources
Technology
16 hrs ago
Nvidia's exclusion of potential China sales from its forecast due to trade uncertainties causes market volatility, while AI enthusiasm continues to drive tech sector growth.
17 Sources
Technology
1 day ago
17 Sources
Technology
1 day ago
Dell Technologies raises annual forecasts due to strong AI server demand, but faces margin pressures from high costs and competition.
15 Sources
Technology
15 hrs ago
15 Sources
Technology
15 hrs ago