Curated by THEOUTPOST
On Fri, 15 Nov, 12:02 AM UTC
6 Sources
[1]
ChatGPT Diagnoses Illnesses Better Than Human Doctors: Study
ChatGPT was able to outperform human doctors in diagnosing diseases and medical conditions in a study. The findings of the study were published last month and highlighted that artificial intelligence (AI) chatbots might be more efficient in analysing patient histories and conditions and provide more accurate diagnoses. While the study aimed to understand if AI chatbots could help doctors provide better diagnoses, the results unexpectedly revealed that OpenAI's GPT-4-powered chatbot performed much better when performing without human assistance compared to when paired with a doctor. The study, published in the JAMA Network Open journal, was conducted at the Beth Israel Deaconess Medical Center in Boston by a group of researchers. The experiment aimed to find out if AI can help doctors better diagnose diseases compared to traditional methods. According to a New York Times report, the experiment involved 50 doctors who were a mix of residents and physicians attending the medical college. They were recruited through multiple large hospital systems in the US and were given six case histories of patients. The subjects were reportedly asked to suggest a diagnosis for each of the cases and provide an explanation for why they favoured or ruled certain diagnoses out. Doctors were said to also be graded based on whether their final diagnosis was right. To evaluate each of the participants' performance, medical experts were reportedly selected as graders. While they were said to be shown the answers, they were not told if the response came from a doctor with access to AI, just the doctor, or from only ChatGPT. Further, to eliminate the possibility of unrealistic case histories, the researchers reportedly picked case histories of real patients that have been used by researchers for decades but have never been published to avoid contamination. This point is important because ChatGPT cannot be trained on data which has never been published. The findings of the study were surprising. Doctors who did not use any AI tool to diagnose the case histories had an average score of 74 percent whereas those physicians who used the chatbot scored 76 percent on average. However, when ChatGPT alone analysed the case histories and provided diagnosis, it scored an average of 90 percent While various factors could have impacted the outcome of the study -- from the experience level of the doctors to individual biases with certain diagnoses -- the researchers believe the study highlights that the potential of AI systems in medical institutions cannot be ignored.
[2]
ChatGPT beat doctors at diagnosing medical conditions, study says
What is 'The Big Lebotski' and how did it help Shake Shack shatter sales goals? The study asked fifty doctors, 26 of whom were attending physicians and 24 of whom were residents, to try to make six different diagnoses for medical conditions with the same case histories. Some doctors were given OpenAI's ChatGPT to help them make their decisions; others went without AI. Doctors who did the project without AI got an average score of 74%, doctors who used AI got an average score of 76%, and ChatGPT itself got an average score of 90%. Dr. Rodman, who helped design the study, told the New York Times he expected the chatbots would significantly help doctors using them. He was "shocked" to see it made little difference and shocked again that the AI itself beat the doctors. AI didn't help doctors using it as much as anticipated because physicians "didn't listen to AI when AI told them things they didn't agree with," Rodman said. Most doctors, he said, were wedded to their own diagnoses and couldn't be convinced a chatbot knew more than them. Another issue that arose was that many doctors didn't know how to fully take advantage of AI's capability. The study's authors wrote that because AI "alone demonstrated higher performance than both physician groups," there is a "need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice."
[3]
ChatGPT Defeated Doctors at Diagnosing Illness
A small study found ChatGPT outdid human physicians when assessing medical case histories, even when those doctors were using a chatbot. Dr. Adam Rodman, an expert in internal medicine at Beth Israel Deaconess Medical Center in Boston, confidently expected that chatbots built to use artificial intelligence would help doctors diagnose illnesses. He was wrong. Instead, in a study Dr. Rodman helped design, doctors who were given ChatGPT-4 along with conventional resources did only slightly better than doctors who did not have access to the bot. And, to the researchers' surprise, ChatGPT alone outperformed the doctors. "I was shocked," Dr. Rodman said. The chatbot, from the company OpenAI, scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent. The study showed more than just the chatbot's superior performance. It unveiled doctors' sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one. And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems' ability to solve complex diagnostic problems and offer explanations for their diagnoses. A.I. systems should be "doctor extenders," Dr. Rodman said, offering valuable second opinions on diagnoses. But it looks as if there is a way to go before that potential is realized. Case History, Case Future The experiment involved 50 doctors, a mix of residents and attending physicians recruited through a few large American hospital systems, and was published last month in the journal JAMA Network Open. The test subjects were given six case histories and were graded on their ability to suggest diagnoses and explain why they favored or ruled them out. Their grades also included getting the final diagnosis right. The graders were medical experts who saw only the participants' answers, without knowing whether they were from a doctor with ChatGPT, a doctor without it or from ChatGPT by itself. The case histories used in the study were based on real patients and are part of a set of 105 cases that has been used by researchers since the 1990s. The cases intentionally have never been published so that medical students and others could be tested on them without any foreknowledge. That also meant that ChatGPT could not have been trained on them. But, to illustrate what the study involved, the investigators published one of the six cases the doctors were tested on, along with answers to the test questions on that case from a doctor who scored high and from one whose score was low. That test case involved a 76-year-old patient with severe pain in his low back, buttocks and calves when he walked. The pain started a few days after he had been treated with balloon angioplasty to widen a coronary artery. He had been treated with the blood thinner heparin for 48 hours after the procedure. The man complained that he felt feverish and tired. His cardiologist had done lab studies that indicated a new onset of anemia and a buildup of nitrogen and other kidney waste products in his blood. The man had had bypass surgery for heart disease a decade earlier. The case vignette continued to include details of the man's physical exam, and then provided his lab test results. The correct diagnosis was cholesterol embolism -- a condition in which shards of cholesterol break off from plaque in arteries and block blood vessels. Participants were asked for three possible diagnoses, with supporting evidence for each. They also were asked to provide, for each possible diagnosis, findings that do not support it or that were expected but not present. The participants also were asked to provide a final diagnosis. Then they were to name up to three additional steps they would take in their diagnostic process. Like the diagnosis for the published case, the diagnoses for the other five cases in the study were not easy to figure out. But neither were they so rare as to be almost unheard-of. Yet the doctors on average did worse than the chatbot. What, the researchers asked, was going on? The answer seems to hinge on questions of how doctors settle on a diagnosis, and how they use a tool like artificial intelligence. The Physician in the Machine How, then, do doctors diagnose patients? The problem, said Dr. Andrew Lea, a historian of medicine at Brigham and Women's Hospital who was not involved with the study, is that "we really don't know how doctors think." In describing how they came up with a diagnosis, doctors would say, "intuition," or, "based on my experience," Dr. Lea said. That sort of vagueness has challenged researchers for decades as they tried to make computer programs that can think like a doctor. The quest began almost 70 years ago. "Ever since there were computers, there were people trying to use them to make diagnoses," Dr. Lea said. One of the most ambitious attempts began in the 1970s at the University of Pittsburgh. Computer scientists there recruited Dr. Jack Myers, chairman of the medical school's department of internal medicine who was known as a master diagnostician. He had a photographic memory and spent 20 hours a week in the medical library, trying to learn everything that was known in medicine. Dr. Myers was given medical details of cases and explained his reasoning as he pondered diagnoses. Computer scientists converted his logic chains into code. The resulting program, called INTERNIST-1, included over 500 diseases and about 3,500 symptoms of disease. To test it, researchers gave it cases from the New England Journal of Medicine. "The computer did really well," Dr. Rodman said. Its performance "was probably better than a human could do," he added. But INTERNIST-1 never took off. It was difficult to use, requiring more than an hour to give it the information needed to make a diagnosis. And, its creators noted, "the present form of the program is not sufficiently reliable for clinical applications." Research continued. By the mid-1990s there were about a half dozen computer programs that tried to make medical diagnoses. None came into widespread use. "It's not just that it has to be user friendly, but doctors had to trust it," Dr. Rodman said. And with the uncertainty about how doctors think, experts began to ask whether they should care. How important is it to try to design computer programs to make diagnoses the same way humans do? "There were arguments over how much a computer program should mimic human reasoning," Dr. Lea said. "Why don't we play to the strength of the computer?" The computer may not be able to give a clear explanation of its decision pathway, but does that matter if it gets the diagnosis right? The conversation changed with the advent of large language models like ChatGPT. They make no explicit attempt to replicate a doctor's thinking; their diagnostic abilities come from their ability to predict language. "The chat interface is the killer app," said Dr. Jonathan H. Chen, a physician and computer scientist at Stanford who was an author of the new study. "We can pop a whole case into the computer," he said. "Before a couple of years ago, computers did not understand language." But many doctors may not be exploiting its potential. Operator Error After his initial shock at the results of the new study, Dr. Rodman decided to probe a little deeper into the data and look at the actual logs of messages between the doctors and ChatGPT. The doctors must have seen the chatbot's diagnoses and reasoning, so why didn't those using the chatbot do better? It turns out that the doctors often were not persuaded by the chatbot when it pointed out something that was at odds with their diagnoses. Instead, they tended to be wedded to their own idea of the correct diagnosis. "They didn't listen to A.I. when A.I. told them things they didn't agree with," Dr. Rodman said. That makes sense, said Laura Zwaan, who studies clinical reasoning and diagnostic error at Erasmus Medical Center in Rotterdam and was not involved in the study. "People generally are overconfident when they think they are right," she said. But there was another issue: Many of the doctors did not know how to use a chatbot to its fullest extent. Dr. Chen said he noticed that when he peered into the doctors' chat logs, "they were treating it like a search engine for directed questions: 'Is cirrhosis a risk factor for cancer? What are possible diagnoses for eye pain?'" "It was only a fraction of the doctors who realized they could literally copy-paste in the entire case history into the chatbot and just ask it to give a comprehensive answer to the entire question," Dr. Chen added. "Only a fraction of doctors actually saw the surprisingly smart and comprehensive answers the chatbot was capable of producing."
[4]
Does AI improve doctors' diagnoses? Study puts it to the test
With hospitals already deploying artificial intelligence to improve patient care, a new study has found that using Chat GPT Plus does not significantly improve the accuracy of doctors' diagnoses when compared with the use of usual resources. The study, from UVA Health's Andrew S. Parsons, MD, MPH and colleagues, enlisted 50 physicians in family medicine, internal medicine and emergency medicine to put Chat GPT Plus to the test. Half were randomly assigned to use Chat GPT Plus to diagnose complex cases, while the other half relied on conventional methods such as medical reference sites (for example, UpToDate) and Google. The researchers then compared the resulting diagnoses, finding that the accuracy across the two groups was similar. That said, Chat GPT alone outperformed both groups, suggesting that it still holds promise for improving patient care. Physicians, however, will need more training and experience with the emerging technology to capitalize on its potential, the researchers conclude. For now, Chat GPT remains best used to augment, rather than replace, human physicians, the researchers say. "Our study shows that AI alone can be an effective and powerful tool for diagnosis," said Parsons, who oversees the teaching of clinical skills to medical students at the University of Virginia School of Medicine and co-leads the Clinical Reasoning Research Collaborative. "We were surprised to find that adding a human physician to the mix actually reduced diagnostic accuracy though improved efficiency. These results likely mean that we need formal training in how best to use AI." Chat GPT for Disease Diagnosis Chatbots called "large language models" that produce human-like responses are growing in popularity, and they have shown impressive ability to take patient histories, communicate empathetically and even solve complex medical cases. But, for now, they still require the involvement of a human doctor. Parsons and his colleagues were eager to determine how the high-tech tool can be used most effectively, so they launched a randomized, controlled trial at three leading-edge hospitals -- UVA Health, Stanford and Harvard's Beth Israel Deaconess Medical Center. The participating docs made diagnoses for "clinical vignettes" based on real-life patient-care cases. These case studies included details about patients' histories, physical exams and lab test results. The researchers then scored the results and examined how quickly the two groups made their diagnoses. The median diagnostic accuracy for the docs using Chat GPT Plus was 76.3%, while the results for the physicians using conventional approaches was 73.7%. The Chat GPT group members reached their diagnoses slightly more quickly overall -- 519 seconds compared with 565 seconds. The researchers were surprised at how well Chat GPT Plus alone performed, with a median diagnostic accuracy of more than 92%. They say this may reflect the prompts used in the study, suggesting that physicians likely will benefit from training on how to use prompts effectively. Alternately, they say, healthcare organizations could purchase predefined prompts to implement in clinical workflow and documentation. The researchers also caution that Chat GPT Plus likely would fare less well in real life, where many other aspects of clinical reasoning come into play -- especially in determining downstream effects of diagnoses and treatment decisions. They're urging additional studies to assess large language models' abilities in those areas and are conducting a similar study on management decision-making. "As AI becomes more embedded in healthcare, it's essential to understand how we can leverage these tools to improve patient care and the physician experience," Parsons said. "This study suggests there is much work to be done in terms of optimizing our partnership with AI in the clinical environment." Following up on this groundbreaking work, the four study sites have also launched a bi-coastal AI evaluation network called ARiSE (AI Research and Science Evaluation) to further evaluate GenAI outputs in healthcare. Find out more information at the ARiSE website.
[5]
Chat GPT Plus doesn't outperform conventional methods in diagnosing patients
University of Virginia Health SystemNov 14 2024 With hospitals already deploying artificial intelligence to improve patient care, a new study has found that using Chat GPT Plus does not significantly improve the accuracy of doctors' diagnoses when compared with the use of usual resources. The study, from UVA Health's Andrew S. Parsons, MD, MPH and colleagues, enlisted 50 physicians in family medicine, internal medicine and emergency medicine to put Chat GPT Plus to the test. Half were randomly assigned to use Chat GPT Plus to diagnose complex cases, while the other half relied on conventional methods such as medical reference sites (for example, UpToDate©) and Google. The researchers then compared the resulting diagnoses, finding that the accuracy across the two groups was similar. That said, Chat GPT alone outperformed both groups, suggesting that it still holds promise for improving patient care. Physicians, however, will need more training and experience with the emerging technology to capitalize on its potential, the researchers conclude. For now, Chat GPT remains best used to augment, rather than replace, human physicians, the researchers say. "Our study shows that AI alone can be an effective and powerful tool for diagnosis," said Parsons, who oversees the teaching of clinical skills to medical students at the University of Virginia School of Medicine and co-leads the Clinical Reasoning Research Collaborative. "We were surprised to find that adding a human physician to the mix actually reduced diagnostic accuracy though improved efficiency. These results likely mean that we need formal training in how best to use AI." Chat GPT for disease diagnosis Chatbots called "large language models" that produce human-like responses are growing in popularity, and they have shown impressive ability to take patient histories, communicate empathetically and even solve complex medical cases. But, for now, they still require the involvement of a human doctor. Parsons and his colleagues were eager to determine how the high-tech tool can be used most effectively, so they launched a randomized, controlled trial at three leading-edge hospitals - UVA Health, Stanford and Harvard's Beth Israel Deaconess Medical Center. The participating docs made diagnoses for "clinical vignettes" based on real-life patient-care cases. These case studies included details about patients' histories, physical exams and lab test results. The researchers then scored the results and examined how quickly the two groups made their diagnoses. The median diagnostic accuracy for the docs using Chat GPT Plus was 76.3%, while the results for the physicians using conventional approaches was 73.7%. The Chat GPT group members reached their diagnoses slightly more quickly overall - 519 seconds compared with 565 seconds. The researchers were surprised at how well Chat GPT Plus alone performed, with a median diagnostic accuracy of more than 92%. They say this may reflect the prompts used in the study, suggesting that physicians likely will benefit from training on how to use prompts effectively. Alternately, they say, healthcare organizations could purchase predefined prompts to implement in clinical workflow and documentation. The researchers also caution that Chat GPT Plus likely would fare less well in real life, where many other aspects of clinical reasoning come into play - especially in determining downstream effects of diagnoses and treatment decisions. They're urging additional studies to assess large language models' abilities in those areas and are conducting a similar study on management decision-making. As AI becomes more embedded in healthcare, it's essential to understand how we can leverage these tools to improve patient care and the physician experience. This study suggests there is much work to be done in terms of optimizing our partnership with AI in the clinical environment." Andrew S. Parsons, MD, MPH, UVA Health Following up on this groundbreaking work, the four study sites have also launched a bi-coastal AI evaluation network called ARiSE (AI Research and Science Evaluation) to further evaluate GenAI outputs in healthcare. Find out more information at the ARiSE website. Findings published The researchers have published their results in the scientific journal JAMA Network Open. The research team consisted of Ethan Goh, Robert Gallo, Jason Hom, Eric Strong, Yingjie Weng, Hannah Kerman, Joséphine A. Cool, Zahir Kanjee, Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew P.J. Olson, Adam Rodman and Jonathan H. Chen. Funding for this research was provided by the Gordon and Betty Moore Foundation. A full list of disclosures and funding sources is included in the paper. University of Virginia Health System Journal reference: Goh, E., et al. (2024). Large Language Model Influence on Diagnostic Reasoning. JAMA Network Open. doi.org/10.1001/jamanetworkopen.2024.40969.
[6]
Can AI Boost Accuracy of Doctors' Diagnoses?
FRIDAY, Nov. 15, 2024 (HealthDay News) -- AI can't yet help doctors improve their ability to diagnose complex conditions, a sobering new study has found. Doctors had about the same diagnostic accuracy whether or not they were using ChatGPT Plus, according to results published recently in the journal JAMA Network Open. However, the AI outperformed doctors when allowed to diagnose on its own, the researchers noted. "Our study shows that AI alone can be an effective and powerful tool for diagnosis," said researcher Dr. Andrew Parsons, who oversees the teaching of clinical skills to medical students at the University of Virginia School of Medicine. "We were surprised to find that adding a human physician to the mix actually reduced diagnostic accuracy though improved efficiency," Parsons said. "These results likely mean that we need formal training in how best to use AI." For the study, researchers provided 50 doctors with case studies based on real-life patients. These cases included details about medical history, physical exams and lab test results. The doctors were randomly assigned to two groups -- one that diagnosed the patients' conditions based solely on the info available and standard reference materials, and another that used ChatGPT Plus to help inform their diagnosis. Doctors using ChatGPT returned an accurate diagnosis about 76% of the time, compared with about 74% for those not aided by AI, results show. The ChatGPT group did come to their diagnoses slightly faster -- about 8.6 minutes compared to 9.4 minutes for those without AI help, researchers found. When ChatGPT Plus was given the case studies on its own, it achieved an accuracy of more than 92%. However, researchers caution that the AI would likely fare less well in real life, when diagnosing patients on the fly versus evaluating case studies. More study is needed to assess the ability of AI to diagnose medical problems, particularly in terms of the downstream effects of diagnoses and the treatment decisions that result from them, researchers said. "As AI becomes more embedded in healthcare, it's essential to understand how we can leverage these tools to improve patient care and the physician experience," Parsons said. "This study suggests there is much work to be done in terms of optimizing our partnership with AI in the clinical environment."
Share
Share
Copy Link
A recent study reveals that ChatGPT, when used alone, significantly outperformed both human doctors and doctors using AI assistance in diagnosing medical conditions, raising questions about the future of AI in healthcare.
A groundbreaking study conducted by researchers at Beth Israel Deaconess Medical Center in Boston has unveiled surprising results regarding the diagnostic capabilities of artificial intelligence (AI) in healthcare. The study, published in the JAMA Network Open journal, found that OpenAI's ChatGPT outperformed human doctors in diagnosing medical conditions 1.
The experiment involved 50 doctors, including both residents and attending physicians, recruited from multiple large hospital systems in the United States. Participants were presented with six case histories of real patients and asked to provide diagnoses and explanations for their reasoning 2.
To ensure fairness and eliminate potential bias, the case histories used were from a set that has been utilized by researchers since the 1990s but never published, preventing ChatGPT from having prior exposure to the information 3.
The study's findings were unexpected:
Dr. Adam Rodman, one of the study's designers, expressed shock at the results, particularly the minimal improvement when doctors used AI assistance and ChatGPT's superior performance when used independently 3.
The study highlighted several key issues:
While the study demonstrates AI's potential in medical diagnosis, researchers caution that real-life scenarios involve additional factors not accounted for in the experiment. The findings suggest a need for:
Following this study, a bi-coastal AI evaluation network called ARiSE (AI Research and Science Evaluation) has been launched to further investigate the potential of AI in healthcare. Additionally, researchers are conducting a similar study focused on management decision-making 5.
As AI continues to evolve and integrate into healthcare systems, understanding its optimal use and potential impact on patient care remains a critical area of research and development.
Reference
[1]
[3]
[4]
[5]
A recent study reveals that ChatGPT, an AI language model, demonstrates superior performance compared to trainee doctors in assessing complex respiratory diseases. This breakthrough highlights the potential of AI in medical diagnostics and its implications for healthcare education and practice.
3 Sources
A new study reveals that while AI models perform well on standardized medical tests, they face significant challenges in simulating real-world doctor-patient conversations, raising concerns about their readiness for clinical deployment.
3 Sources
Recent studies showcase AI's potential in revolutionizing brain tumor diagnosis. An AI system outperforms radiologists in accuracy, while ChatGPT demonstrates utility in preoperative MRI analysis, marking significant advancements in medical imaging and diagnostics.
2 Sources
A new study from UC San Francisco shows that AI models like ChatGPT are not yet ready to make critical decisions in emergency rooms, tending to overprescribe treatments and admissions compared to human doctors.
5 Sources
A recent survey reveals that 20% of general practitioners are utilizing AI tools like ChatGPT for various tasks, despite a lack of formal guidance. This trend highlights both potential benefits and risks in healthcare.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved