3 Sources
[1]
AI tools show limitations in diagnosing atypical emergency room cases
West Virginia UniversityMay 23 2025 Artificial intelligence tools can assist emergency room physicians in accurately predicting disease but only for patients with typical symptoms, West Virginia University scientists have found. Gangqing "Michael" Hu, assistant professor in the WVU School of Medicine Department of Microbiology, Immunology and Cell Biology and director of the WVU Bioinformatics Core facility, led a study that compared the precision and accuracy of four ChatGPT models in making medical diagnoses and explaining their reasoning. His findings, published in the journal Scientific Reports, demonstrate the need for incorporating greater amounts of different types of data in training AI technology to assist in disease diagnosis. More data can make the difference in whether AI gives patients the correct diagnoses for what are called "challenging cases," which don't exhibit classic symptoms. As an example, Hu pointed to a trio of scenarios from his study involving patients who had pneumonia without the typical fever. In these three cases, all of the GPT models failed to give an accurate diagnosis. That made us dive in to look at the physicians' notes and we noticed the pattern of these being challenging cases. ChatGPT tends to get a lot of information from different resources on the internet, but these may not cover atypical disease presentation." Gangqing "Michael" Hu, Assistant Professor, WVU School of Medicine Department of Microbiology, Immunology and Cell Biology The study analyzed data from 30 public emergency department cases, which for reasons of privacy did not include demographics. Hu explained that in using ChatGPT to assist with diagnosis, physicians' notes are uploaded, and the tool is asked to provide its top three diagnoses. Results varied for the versions Hu tested: the GPT-3.5, GPT-4, GPT-4o and o1 series. "When we looked at whether the AI models gave the correct diagnosis in any of their top three results, we didn't see a significant improvement between the new version and the older version," he said. "But when we look at each model's number one diagnosis, the new version is about 15% to 20% higher in accuracy than the older version." Given AI models' current low performance on complex and atypical cases, Hu said human oversight is a necessity for high-quality, patient-centered care when using AI as an assistive tool. "We didn't do this study out of curiosity to see if the new model will give better results. We wanted to establish a basis for future studies that involve additional input," Hu said. "Currently, we input physician notes only. In the future we want to improve the accuracy by including images and findings from laboratory tests." Hu also plans to expand on findings from one of his recent studies in which he applied the ChatGPT-4 model to the task of role playing a physiotherapist, psychologist, nutritionist, artificial intelligence expert and athlete in a simulated panel discussion about sports rehabilitation. He said he believes a model like that can improve AI's diagnostic accuracy by taking a conversational approach in which multiple AI agents interact. "From a position of trust, I think it's very important to see the reasoning steps," Hu said. "In this case, high-quality data including both typical and atypical cases helps build the trust." Hu emphasized that while ChatGPT is promising, it is not a certified medical device. He said if health care providers were to include images or other data in a clinical setting, the AI model would be an open-source system and installed in a hospital cluster to comply with privacy laws. Other contributors to the study were Jinge Wang, a postdoctoral fellow, and Kenneth Shue, a lab volunteer from Montgomery County, Maryland, both in the School of Medicine Department of Microbiology, Immunology and Cell Biology; as well as Li Liu, Arizona State University. The work was supported by funding from the National Institutes of Health and National Science Foundation. Hu said future research on using ChatGPT in emergency departments could examine whether enhancing AIs' abilities to explain their reasoning could contribute to triage or decisions about patient treatment. Source: West Virginia University Journal reference: Wang, J., et al. (2025). Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics. Scientific Reports. doi.org/10.1038/s41598-025-95233-1.
[2]
AI's usefulness in emergency room diagnoses is limited to presentation of typical symptoms, researchers find
Artificial intelligence tools can assist emergency room physicians in accurately predicting disease, but only for patients with typical symptoms, West Virginia University scientists have found. Gangqing "Michael" Hu, assistant professor in the WVU School of Medicine Department of Microbiology, Immunology and Cell Biology and director of the WVU Bioinformatics Core facility, led a study that compared the precision and accuracy of four ChatGPT models in making medical diagnoses and explaining their reasoning. His findings, published in the journal Scientific Reports, demonstrate the need for incorporating greater amounts of different types of data in training AI technology to assist in disease diagnosis. More data can make the difference in whether AI gives patients the correct diagnoses for what are called "challenging cases," which don't exhibit classic symptoms. As an example, Hu pointed to a trio of scenarios from his study involving patients who had pneumonia without the typical fever. "In these three cases, all of the GPT models failed to give an accurate diagnosis," Hu said. "That made us dive in to look at the physicians' notes and we noticed the pattern of these being challenging cases. ChatGPT tends to get a lot of information from different resources on the internet, but these may not cover atypical disease presentation." The study analyzed data from 30 public emergency department cases, which -- for reasons of privacy -- did not include demographics. Hu explained that in using ChatGPT to assist with diagnosis, physicians' notes are uploaded, and the tool is asked to provide its top three diagnoses. Results varied for the versions Hu tested: the GPT-3.5, GPT-4, GPT-4o and o1 series. "When we looked at whether the AI models gave the correct diagnosis in any of their top three results, we didn't see a significant improvement between the new version and the older version," he said. "But when we look at each model's number one diagnosis, the new version is about 15% to 20% higher in accuracy than the older version." Given AI models' current low performance on complex and atypical cases, Hu said human oversight is a necessity for high-quality, patient-centered care when using AI as an assistive tool. "We didn't do this study out of curiosity to see if the new model would give better results. We wanted to establish a basis for future studies that involve additional input," Hu said. "Currently, we input physician notes only. In the future, we want to improve the accuracy by including images and findings from laboratory tests." Hu also plans to expand on findings from one of his recent studies in which he applied the ChatGPT-4 model to the task of role-playing a physiotherapist, psychologist, nutritionist, artificial intelligence expert and athlete in a simulated panel discussion about sports rehabilitation. He said he believes a model like that can improve AI's diagnostic accuracy by taking a conversational approach in which multiple AI agents interact. "From a position of trust, I think it's very important to see the reasoning steps," Hu said. "In this case, high-quality data including both typical and atypical cases helps build trust." Hu emphasized that while ChatGPT is promising, it is not a certified medical device. He said if health care providers were to include images or other data in a clinical setting, the AI model would be an open-source system and installed in a hospital cluster to comply with privacy laws. Other contributors to the study were Jinge Wang, a postdoctoral fellow, and Kenneth Shue, a lab volunteer from Montgomery County, Maryland, both in the School of Medicine Department of Microbiology, Immunology and Cell Biology; as well as Li Liu, Arizona State University. Hu noted that future research on using ChatGPT in emergency departments could examine whether enhancing AIs' abilities to explain their reasoning could contribute to triage or decisions about patient treatment.
[3]
WVU Researchers Test AI's Limits in Emergency Room Diagnoses | Newswise
Gangqing "Michael" Hu, assistant professor, Department of Microbiology, Immunology and Cell Biology, WVU School of Medicine Newswise -- Artificial intelligence tools can assist emergency room physicians in accurately predicting disease but only for patients with typical symptoms, West Virginia University scientists have found. Gangqing "Michael" Hu, assistant professor in the WVU School of Medicine Department of Microbiology, Immunology and Cell Biology and director of the WVU Bioinformatics Core facility, led a study that compared the precision and accuracy of four ChatGPT models in making medical diagnoses and explaining their reasoning. His findings, published in the journal Scientific Reports, demonstrate the need for incorporating greater amounts of different types of data in training AI technology to assist in disease diagnosis. More data can make the difference in whether AI gives patients the correct diagnoses for what are called "challenging cases," which don't exhibit classic symptoms. As an example, Hu pointed to a trio of scenarios from his study involving patients who had pneumonia without the typical fever. "In these three cases, all of the GPT models failed to give an accurate diagnosis," Hu said. "That made us dive in to look at the physicians' notes and we noticed the pattern of these being challenging cases. ChatGPT tends to get a lot of information from different resources on the internet, but these may not cover atypical disease presentation." The study analyzed data from 30 public emergency department cases, which for reasons of privacy did not include demographics. Hu explained that in using ChatGPT to assist with diagnosis, physicians' notes are uploaded, and the tool is asked to provide its top three diagnoses. Results varied for the versions Hu tested: the GPT-3.5, GPT-4, GPT-4o and o1 series. "When we looked at whether the AI models gave the correct diagnosis in any of their top three results, we didn't see a significant improvement between the new version and the older version," he said. "But when we look at each model's number one diagnosis, the new version is about 15% to 20% higher in accuracy than the older version." Given AI models' current low performance on complex and atypical cases, Hu said human oversight is a necessity for high-quality, patient-centered care when using AI as an assistive tool. "We didn't do this study out of curiosity to see if the new model will give better results. We wanted to establish a basis for future studies that involve additional input," Hu said. "Currently, we input physician notes only. In the future we want to improve the accuracy by including images and findings from laboratory tests." Hu also plans to expand on findings from one of his recent studies in which he applied the ChatGPT-4 model to the task of role playing a physiotherapist, psychologist, nutritionist, artificial intelligence expert and athlete in a simulated panel discussion about sports rehabilitation. He said he believes a model like that can improve AI's diagnostic accuracy by taking a conversational approach in which multiple AI agents interact. "From a position of trust, I think it's very important to see the reasoning steps," Hu said. "In this case, high-quality data including both typical and atypical cases helps build the trust." Hu emphasized that while ChatGPT is promising, it is not a certified medical device. He said if health care providers were to include images or other data in a clinical setting, the AI model would be an open-source system and installed in a hospital cluster to comply with privacy laws. Other contributors to the study were Jinge Wang, a postdoctoral fellow, and Kenneth Shue, a lab volunteer from Montgomery County, Maryland, both in the School of Medicine Department of Microbiology, Immunology and Cell Biology; as well as Li Liu, Arizona State University. The work was supported by funding from the National Institutes of Health and National Science Foundation. Hu said future research on using ChatGPT in emergency departments could examine whether enhancing AIs' abilities to explain their reasoning could contribute to triage or decisions about patient treatment.
Share
Copy Link
A study by West Virginia University researchers reveals that AI tools like ChatGPT can assist in emergency room diagnoses but struggle with atypical cases, highlighting the need for human oversight and more diverse training data.
A recent study conducted by West Virginia University (WVU) researchers has shed light on the capabilities and limitations of artificial intelligence (AI) tools in assisting emergency room physicians with diagnoses. The research, led by Gangqing "Michael" Hu, assistant professor in the WVU School of Medicine, compared the precision and accuracy of four ChatGPT models in making medical diagnoses and explaining their reasoning 123.
Source: News-Medical
The study, published in the journal Scientific Reports, analyzed data from 30 public emergency department cases. The researchers found that AI tools can accurately predict diseases, but only for patients presenting with typical symptoms. In cases where patients exhibited atypical symptoms, the AI models struggled to provide accurate diagnoses 1.
Hu explained, "In these three cases, all of the GPT models failed to give an accurate diagnosis. That made us dive in to look at the physicians' notes and we noticed the pattern of these being challenging cases" 2. This limitation highlights the need for incorporating a greater variety of data types in training AI technology for disease diagnosis assistance.
The study tested four versions of ChatGPT: GPT-3.5, GPT-4, GPT-4o, and o1 series. While there was no significant improvement in the models' ability to provide the correct diagnosis within their top three results, the newer versions showed a 15% to 20% higher accuracy in their top diagnosis compared to older versions 123.
Hu and his team are exploring ways to improve AI's diagnostic accuracy:
Incorporating additional data: Future studies aim to include images and laboratory test findings to enhance AI performance 2.
Multi-agent interactions: Hu plans to expand on a recent study where ChatGPT-4 was used to simulate a panel discussion about sports rehabilitation, believing that a conversational approach with multiple AI agents could improve diagnostic accuracy 3.
Enhancing reasoning capabilities: Future research may focus on improving AI's ability to explain its reasoning, potentially contributing to triage or treatment decisions 1.
Given the current limitations of AI models in handling complex and atypical cases, Hu emphasized the necessity of human oversight for high-quality, patient-centered care when using AI as an assistive tool 2. He stated, "From a position of trust, I think it's very important to see the reasoning steps. In this case, high-quality data including both typical and atypical cases helps build the trust" 3.
Hu cautioned that while ChatGPT shows promise, it is not a certified medical device. He suggested that if healthcare providers were to include images or other data in a clinical setting, the AI model would need to be an open-source system installed in a hospital cluster to comply with privacy laws 13.
This study, supported by funding from the National Institutes of Health and National Science Foundation, provides valuable insights into the current state of AI in emergency medicine and paves the way for future advancements in this rapidly evolving field 1.
Summarized by
Navi
[2]
NVIDIA CEO Jensen Huang confirms the development of the company's most advanced AI architecture, 'Rubin', with six new chips currently in trial production at TSMC.
2 Sources
Technology
17 hrs ago
2 Sources
Technology
17 hrs ago
Databricks, a leading data and AI company, is set to acquire machine learning startup Tecton to bolster its AI agent offerings. This strategic move aims to improve real-time data processing and expand Databricks' suite of AI tools for enterprise customers.
3 Sources
Technology
17 hrs ago
3 Sources
Technology
17 hrs ago
Google is providing free users of its Gemini app temporary access to the Veo 3 AI video generation tool, typically reserved for paying subscribers, for a limited time this weekend.
3 Sources
Technology
9 hrs ago
3 Sources
Technology
9 hrs ago
Broadcom's stock rises as the company capitalizes on the AI boom, driven by massive investments from tech giants in data infrastructure. The chipmaker faces both opportunities and challenges in this rapidly evolving landscape.
2 Sources
Technology
17 hrs ago
2 Sources
Technology
17 hrs ago
Apple is set to introduce new enterprise-focused AI tools, including ChatGPT configuration options and potential support for other AI providers, as part of its upcoming software updates.
2 Sources
Technology
17 hrs ago
2 Sources
Technology
17 hrs ago