Curated by THEOUTPOST
On Tue, 8 Apr, 4:03 PM UTC
2 Sources
[1]
Is AI in medicine playing fair?
As artificial intelligence (AI) rapidly integrates into health care, a new study by researchers at the Icahn School of Medicine at Mount Sinai reveals that all generative AI models may recommend different treatments for the same medical condition based solely on a patient's socioeconomic and demographic background. Their findings, which are detailed in the April 7, 2025 online issue of Nature Medicine, highlight the importance of early detection and intervention to ensure that AI-driven care is safe, effective, and appropriate for all. As part of their investigation, the researchers stress-tested nine large language models (LLMs) on 1,000 emergency department cases, each replicated with 32 different patient backgrounds, generating more than 1.7 million AI-generated medical recommendations. Despite identical clinical details, the AI models occasionally altered their decisions based on a patient's socioeconomic and demographic profile, affecting key areas such as triage priority, diagnostic testing, treatment approach, and mental health evaluation. "Our research provides a framework for AI assurance, helping developers and health care institutions design fair and reliable AI tools," says co-senior author Eyal Klang, MD, Chief of Generative-AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. "By identifying when AI shifts its recommendations based on background rather than medical need, we inform better model training, prompt design, and oversight. Our rigorous validation process tests AI outputs against clinical standards, incorporating expert feedback to refine performance. This proactive approach not only enhances trust in AI-driven care but also helps shape policies for better health care for all." One of the study's most striking findings was the tendency of some AI models to escalate care recommendations -- particularly for mental health evaluations -- based on patient demographics rather than medical necessity. In addition, high-income patients were more often recommended advanced diagnostic tests such as CT scans or MRI, while low-income patients were more frequently advised to undergo no further testing. The scale of these inconsistencies underscores the need for stronger oversight, say the researchers. While the study provides critical insights, researchers caution that it represents only a snapshot of AI behavior. Future research will continue to include assurance testing to evaluate how AI models perform in real-world clinical settings and whether different prompting techniques can reduce bias. The team also aims to work with other health care institutions to refine AI tools, ensuring they uphold the highest ethical standards and treat all patients fairly. "I am delighted to partner with Mount Sinai on this critical research to ensure AI-driven medicine benefits patients across the globe," says physician-scientist and first author of the study, Mahmud Omar, MD, who consults with the research team. "As AI becomes more integrated into clinical care, it's essential to thoroughly evaluate its safety, reliability, and fairness. By identifying where these models may introduce bias, we can work to refine their design, strengthen oversight, and build systems that ensure patients remain at the heart of safe, effective care. This collaboration is an important step toward establishing global best practices for AI assurance in health care." "AI has the power to revolutionize health care, but only if it's developed and used responsibly," says co-senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health Director of the Hasso Plattner Institute for Digital Health, and the Irene and Dr. Arthur M. Fishberg Professor of Medicine, at the Icahn School of Medicine at Mount Sinai. "Through collaboration and rigorous validation, we are refining AI tools to uphold the highest ethical standards and ensure appropriate, patient-centered care. By implementing robust assurance protocols, we not only advance technology but also build the trust essential for transformative health care. With proper testing and safeguards, we can ensure these technologies improve care for everyone -- not just certain groups." Next, the investigators plan to expand their work by simulating multistep clinical conversations and piloting AI models in hospital settings to measure their real-world impact. They hope their findings will guide the development of policies and best practices for AI assurance in health care, fostering trust in these powerful new tools. The paper is titled "Socio-Demographic Biases in Medical Decision-Making by Large Language Models: A Large-Scale Multi-Model Analysis." The study's authors, as listed in the journal, are Mahmud Omar, Shelly Soffer, Reem Agbareia, Nicola Luigi Bragazzi, Donald U. Apakama, Carol R. Horowitz, Alexander W. Charney, Robert Freeman, Benjamin Kummer, Benjamin S. Glicksberg, Girish N. Nadkarni, and Eyal Klang.
[2]
Is AI in medicine playing fair? Researchers stress-test generative models, urging safeguards
As artificial intelligence (AI) rapidly integrates into health care, a study by researchers at the Icahn School of Medicine at Mount Sinai reveals that all generative AI models may recommend different treatments for the same medical condition based solely on a patient's socioeconomic and demographic background. Their findings, which are detailed in Nature Medicine, highlight the importance of early detection and intervention to ensure that AI-driven care is safe, effective, and appropriate for all. The paper is titled "Socio-Demographic Biases in Medical Decision-Making by Large Language Models: A Large-Scale Multi-Model Analysis." As part of their investigation, the researchers stress-tested nine large language models (LLMs) on 1,000 emergency department cases, each replicated with 32 different patient backgrounds, generating more than 1.7 million AI-generated medical recommendations. Despite identical clinical details, the AI models occasionally altered their decisions based on a patient's socioeconomic and demographic profile, affecting key areas such as triage priority, diagnostic testing, treatment approach, and mental health evaluation. "Our research provides a framework for AI assurance, helping developers and health care institutions design fair and reliable AI tools," says co-senior author Eyal Klang, MD, Chief of Generative-AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. "By identifying when AI shifts its recommendations based on background rather than medical need, we inform better model training, prompt design, and oversight. Our rigorous validation process tests AI outputs against clinical standards, incorporating expert feedback to refine performance. This proactive approach not only enhances trust in AI-driven care but also helps shape policies for better health care for all." One of the study's most striking findings was the tendency of some AI models to escalate care recommendations -- particularly for mental health evaluations -- based on patient demographics rather than medical necessity. In addition, high-income patients were more often recommended advanced diagnostic tests such as CT scans or MRI, while low-income patients were more frequently advised to undergo no further testing. The scale of these inconsistencies underscores the need for stronger oversight, say the researchers. While the study provides critical insights, researchers caution that it represents only a snapshot of AI behavior. Future research will continue to include assurance testing to evaluate how AI models perform in real-world clinical settings and whether different prompting techniques can reduce bias. The team also aims to work with other health care institutions to refine AI tools, ensuring they uphold the highest ethical standards and treat all patients fairly. "I am delighted to partner with Mount Sinai on this critical research to ensure AI-driven medicine benefits patients across the globe," says physician-scientist and first author of the study, Mahmud Omar, MD, who consults with the research team. "As AI becomes more integrated into clinical care, it's essential to thoroughly evaluate its safety, reliability, and fairness. By identifying where these models may introduce bias, we can work to refine their design, strengthen oversight, and build systems that ensure patients remain at the heart of safe, effective care. This collaboration is an important step toward establishing global best practices for AI assurance in health care." "AI has the power to revolutionize health care, but only if it's developed and used responsibly," says co-senior author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health Director of the Hasso Plattner Institute for Digital Health, and the Irene and Dr. Arthur M. Fishberg Professor of Medicine, at the Icahn School of Medicine at Mount Sinai. "Through collaboration and rigorous validation, we are refining AI tools to uphold the highest ethical standards and ensure appropriate, patient-centered care. By implementing robust assurance protocols, we not only advance technology but also build the trust essential for transformative health care. With proper testing and safeguards, we can ensure these technologies improve care for everyone -- not just certain groups." Next, the investigators plan to expand their work by simulating multistep clinical conversations and piloting AI models in hospital settings to measure their real-world impact. They hope their findings will guide the development of policies and best practices for AI assurance in health care, fostering trust in these powerful new tools.
Share
Share
Copy Link
A groundbreaking study by Mount Sinai researchers uncovers potential biases in AI-driven medical recommendations based on patients' socioeconomic and demographic backgrounds, highlighting the need for robust AI assurance in healthcare.
A groundbreaking study conducted by researchers at the Icahn School of Medicine at Mount Sinai has revealed that generative AI models may recommend different treatments for identical medical conditions based solely on a patient's socioeconomic and demographic background. The findings, published in the April 7, 2025 online issue of Nature Medicine, underscore the critical need for early detection and intervention to ensure AI-driven healthcare is safe, effective, and equitable for all patients 12.
The research team, led by Dr. Eyal Klang and Dr. Girish N. Nadkarni, stress-tested nine large language models (LLMs) on 1,000 emergency department cases. Each case was replicated with 32 different patient backgrounds, generating over 1.7 million AI-generated medical recommendations. Despite identical clinical details, the AI models occasionally altered their decisions based on a patient's socioeconomic and demographic profile 1.
The study revealed significant inconsistencies in AI-generated recommendations across several key areas:
One of the most striking findings was the tendency of some AI models to escalate care recommendations, particularly for mental health evaluations, based on patient demographics rather than medical necessity. Additionally, high-income patients were more frequently recommended advanced diagnostic tests such as CT scans or MRI, while low-income patients were more often advised to undergo no further testing 12.
Dr. Klang emphasized that their research provides a framework for AI assurance, helping developers and healthcare institutions design fair and reliable AI tools. The team's rigorous validation process tests AI outputs against clinical standards and incorporates expert feedback to refine performance 1.
While the study offers critical insights, the researchers caution that it represents only a snapshot of AI behavior. Future research will continue to include assurance testing to evaluate how AI models perform in real-world clinical settings and whether different prompting techniques can reduce bias 2.
Dr. Mahmud Omar, the first author of the study, stressed the importance of thoroughly evaluating AI's safety, reliability, and fairness as it becomes more integrated into clinical care. The team aims to work with other healthcare institutions to refine AI tools, ensuring they uphold the highest ethical standards and treat all patients fairly 12.
The investigators plan to expand their work by:
Dr. Nadkarni emphasized that while AI has the power to revolutionize healthcare, it must be developed and used responsibly. By implementing robust assurance protocols, the team aims to advance technology and build trust essential for transformative healthcare 12.
This study marks a significant step towards establishing global best practices for AI assurance in healthcare, ensuring that these powerful tools improve care for all patients, regardless of their socioeconomic or demographic background.
Reference
[1]
[2]
Medical Xpress - Medical and Health News
|Is AI in medicine playing fair? Researchers stress-test generative models, urging safeguardsA global initiative has produced a set of recommendations to address potential bias in AI-based medical technologies, aiming to ensure equitable and effective healthcare for all.
3 Sources
3 Sources
Researchers at Mount Sinai have identified strategies for using large language models in healthcare settings, potentially reducing costs by up to 17-fold while maintaining performance.
4 Sources
4 Sources
University of Maryland School of Medicine researchers argue that while AI is crucial in predictive medicine, it should be combined with traditional mathematical modeling for optimal outcomes in healthcare, especially in cancer treatment.
2 Sources
2 Sources
Recent studies highlight the potential of artificial intelligence in medical settings, demonstrating improved diagnostic accuracy and decision-making. However, researchers caution about the need for careful implementation and human oversight.
2 Sources
2 Sources
A recent study reveals that AI models, including ChatGPT, do not exhibit racial or sex-based bias when suggesting opioid treatments. This finding challenges concerns about AI perpetuating healthcare disparities.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved