2 Sources
[1]
Machine learning models fall short in predicting in-hospital mortality
Virginia TechMar 11 2025 It would be greatly beneficial to physicians trying to save lives in intensive care units if they could be alerted when a patient's condition rapidly deteriorates or shows vitals in highly abnormal ranges. While current machine learning models are attempting to achieve that goal, a Virginia Tech study recently published in Communications Medicine shows that they are falling short with models for in-hospital mortality prediction, which refers to predicting the likelihood of a patient dying in the hospital, failing to recognize 66 percent of the injuries. Predictions are only valuable if they can accurately recognize critical patient conditions. They need to be able to identify patients with worsening health conditions and alert doctors promptly." Danfeng "Daphne" Yao, professor in the Department of Computer Science and affiliate faculty at the Sanghani Center for Artificial Intelligence and Data Analytics "Our study found serious deficiencies in the responsiveness of current machine learning models," said Yao. "Most of the models we evaluated cannot recognize critical health events and that poses a major problem." To conduct their research, Yao and computer science Ph.D. student Tanmoy Sarkar Pias collaborated with: Sharmin Afrose, Oak Ridge National Laboratory, Tennessee Moon Das Tuli, Greenlife Medical College Hospital, Dhaka, Bangladesh Ipsita Hamid Trisha, Banner University Medical Center, Tucson, and University of Arizona College of Medicine Xinwei Deng, Department of Statistics at Virginia Tech Charles B. Nemeroff, Department of Psychiatry and Behavioral Sciences, University of Texas at Austin Dell Medical School Their paper, "Low Responsiveness of Machine Learning Models to Critical or Deteriorating Health Conditions," shows patient data is not enough to teach models how to determine future health risks. Calibrating health care models with "test patients" helps reveal the models' true ability and limitations. The team developed multiple medical testing approaches, including a gradient ascent method and neural activation map. Color changes in the neural activation map indicate how well machine learning models react to worsening patient conditions. The gradient ascent method can automatically generate special test cases, making it easier to evaluate the quality of a model. "We systematically assessed machine learning models' ability to respond to serious medical conditions using new test cases, some of which are time series, meaning they use a sequence of observations collected at regular intervals to forecast future values," Pias said. "Guided by medical doctors, our evaluation involved multiple machine learning models, optimization techniques, and four data sets for two clinical prediction tasks." In addition to models failing to recognize 66 percent of injuries for in-hospital mortality prediction, the models failed to generate, in some instances, adequate mortality risk scores for all test cases. The study identified similar deficiencies in the responsiveness of five-year breast and lung cancer prognosis models. These findings inform future health care research using machine learning and artificial intelligence (AI), Yao said, because they show that statistical machine learning models trained solely from patient data are grossly insufficient and have many dangerous blind spots. To diversify training data, one may leverage strategically developed synthetic samples, an approach Yao's team explored in 2022 to enhance prediction fairness for minority patients. "A more fundamental design is to incorporate medical knowledge deeply into clinical machine learning models," she said. "This is highly interdisciplinary work, requiring a large team with both computing and medical expertise." In the meantime, Yao's group is actively testing other medical models, including large language models, for their safety and efficacy in time-sensitive clinical tasks, such as sepsis detection. "AI safety testing is a race against time, as companies are pouring products into the medical space," she said. "Transparent and objective testing is a must. AI testing helps protect people's lives and that's what my group is committed to." Virginia Tech Journal reference: Pias, T. S., et al. (2025). Low responsiveness of machine learning models to critical or deteriorating health conditions. Communications Medicine. doi.org/10.1038/s43856-025-00775-0.
[2]
Machine learning models fail to detect key health deteriorations, research shows
It would be greatly beneficial to physicians trying to save lives in intensive care units if they could be alerted when a patient's condition rapidly deteriorates or shows vitals in highly abnormal ranges. While current machine learning models are attempting to achieve that goal, a Virginia Tech study published in Communications Medicine shows that they are falling short with models for in-hospital mortality prediction, which refers to predicting the likelihood of a patient dying in the hospital, failing to recognize 66% of the injuries. "Predictions are only valuable if they can accurately recognize critical patient conditions. They need to be able to identify patients with worsening health conditions and alert doctors promptly," said Danfeng "Daphne" Yao, professor in the Department of Computer Science and affiliate faculty member at the Sanghani Center for Artificial Intelligence and Data Analytics. "Our study found serious deficiencies in the responsiveness of current machine learning models," said Yao. "Most of the models we evaluated cannot recognize critical health events and that poses a major problem." To conduct their research, Yao and computer science Ph.D. student Tanmoy Sarkar Pias collaborated with a number of researchers. Their paper, "Low Responsiveness of Machine Learning Models to Critical or Deteriorating Health Conditions," shows patient data is not enough to teach models how to determine future health risks. Calibrating health care models with "test patients" helps reveal the models' true ability and limitations. The team developed multiple medical testing approaches, including a gradient ascent method and neural activation map. Color changes in the neural activation map indicate how well machine learning models react to worsening patient conditions. The gradient ascent method can automatically generate special test cases, making it easier to evaluate the quality of a model. "We systematically assessed machine learning models' ability to respond to serious medical conditions using new test cases, some of which are time series, meaning they use a sequence of observations collected at regular intervals to forecast future values," Pias said. "Guided by medical doctors, our evaluation involved multiple machine learning models, optimization techniques, and four data sets for two clinical prediction tasks." In addition to models failing to recognize 66% of injuries for in-hospital mortality prediction, the models failed to generate, in some instances, adequate mortality risk scores for all test cases. The study identified similar deficiencies in the responsiveness of five-year breast and lung cancer prognosis models. These findings inform future health care research using machine learning and artificial intelligence (AI), Yao said, because they show that statistical machine learning models trained solely from patient data are grossly insufficient and have many dangerous blind spots. To diversify training data, one may leverage strategically developed synthetic samples, an approach Yao's team explored in 2022 to enhance prediction fairness for minority patients. "A more fundamental design is to incorporate medical knowledge deeply into clinical machine learning models," she said. "This is highly interdisciplinary work, requiring a large team with both computing and medical expertise." In the meantime, Yao's group is actively testing other medical models, including large language models, for their safety and efficacy in time-sensitive clinical tasks, such as sepsis detection. "AI safety testing is a race against time, as companies are pouring products into the medical space," she said. "Transparent and objective testing is a must. AI testing helps protect people's lives and that's what my group is committed to."
Share
Copy Link
A Virginia Tech study reveals significant shortcomings in current machine learning models for predicting in-hospital mortality, with models failing to recognize 66% of critical health events.
A recent study conducted by Virginia Tech researchers has uncovered significant limitations in current machine learning models used for predicting in-hospital mortality. The research, published in Communications Medicine, reveals that these models fail to recognize 66% of critical health events, raising concerns about their effectiveness in real-world medical settings 12.
The study, led by Professor Danfeng "Daphne" Yao from the Department of Computer Science at Virginia Tech, evaluated multiple machine learning models using various data sets and clinical prediction tasks. The researchers found that:
These findings highlight the potential dangers of relying solely on statistical machine learning models trained on patient data for critical healthcare decisions.
To assess the models' responsiveness, the research team developed innovative testing methods:
These approaches provide a more comprehensive evaluation of model performance and reveal limitations that may not be apparent through traditional testing methods.
The study's results have significant implications for the future of AI and machine learning in healthcare:
Professor Yao's team is actively working on addressing these challenges:
As companies rapidly introduce AI products into the medical field, the researchers stress the critical need for transparent and objective testing:
"AI safety testing is a race against time, as companies are pouring products into the medical space," said Professor Yao. "Transparent and objective testing is a must. AI testing helps protect people's lives and that's what my group is committed to" 12.
This study serves as a crucial reminder of the importance of rigorous testing and evaluation of AI systems in healthcare, where the stakes are often life and death. As machine learning continues to advance, ensuring its reliability and safety in medical applications remains a top priority for researchers and healthcare professionals alike.
Summarized by
Navi
Cloudflare introduces a new system allowing website owners to charge AI companies for scraping content, aiming to balance content creation and AI innovation while addressing concerns over uncontrolled data harvesting.
23 Sources
Technology
23 hrs ago
23 Sources
Technology
23 hrs ago
Amazon reaches a milestone with its one millionth robot deployment and introduces DeepFleet, a generative AI model to optimize warehouse operations, signaling a significant shift in the e-commerce giant's workforce dynamics.
14 Sources
Business and Economy
23 hrs ago
14 Sources
Business and Economy
23 hrs ago
Grammarly, the popular writing assistant, has acquired Superhuman, an AI-powered email client, in a bid to expand its AI productivity suite and diversify its offerings beyond grammar correction.
10 Sources
Business and Economy
23 hrs ago
10 Sources
Business and Economy
23 hrs ago
Surge AI, a data-labeling firm competing with Scale AI, is seeking up to $1 billion in its first-ever capital raise. The company aims for a $15 billion valuation, capitalizing on recent customer shifts in the AI industry.
5 Sources
Business and Economy
15 hrs ago
5 Sources
Business and Economy
15 hrs ago
Nothing launches its most expensive flagship yet, the Phone (3), featuring a new Glyph Matrix display, improved cameras, and AI-powered features, aiming to compete with major smartphone brands.
15 Sources
Technology
16 hrs ago
15 Sources
Technology
16 hrs ago