Curated by THEOUTPOST
On Fri, 4 Apr, 8:02 AM UTC
2 Sources
[1]
New method assesses and improves the reliability of radiologists' diagnostic reports
Caption: A new calibration method developed by MIT researchers can improve the accuracy of clinical reports written by radiologists by helping them express their confidence more reliably. Due to the inherent ambiguity in medical images like X-rays, radiologists often use words like "may" or "likely" when describing the presence of a certain pathology, such as pneumonia. But do the words radiologists use to express their confidence level accurately reflect how often a particular pathology occurs in patients? A new study shows that when radiologists express confidence about a certain pathology using a phrase like "very likely," they tend to be overconfident, and vice-versa when they express less confidence using a word like "possibly." Using clinical data, a multidisciplinary team of MIT researchers in collaboration with researchers and clinicians at hospitals affiliated with Harvard Medical School created a framework to quantify how reliable radiologists are when they express certainty using natural language terms. They used this approach to provide clear suggestions that help radiologists choose certainty phrases that would improve the reliability of their clinical reporting. They also showed that the same technique can effectively measure and improve the calibration of large language models by better aligning the words models use to express confidence with the accuracy of their predictions. By helping radiologists more accurately describe the likelihood of certain pathologies in medical images, this new framework could improve the reliability of critical clinical information. "The words radiologists use are important. They affect how doctors intervene, in terms of their decision making for the patient. If these practitioners can be more reliable in their reporting, patients will be the ultimate beneficiaries," says Peiqi Wang, an MIT graduate student and lead author of a paper on this research. He is joined on the paper by senior author Polina Golland, a Sunlin and Priscilla Chou Professor of Electrical Engineering and Computer Science (EECS), a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), and the leader of the Medical Vision Group; as well as Barbara D. Lam, a clinical fellow at the Beth Israel Deaconess Medical Center; Yingcheng Liu, at MIT graduate student; Ameneh Asgari-Targhi, a research fellow at Massachusetts General Brigham (MGB); Rameswar Panda, a research staff member at the MIT-IBM Watson AI Lab; William M. Wells, a professor of radiology at MGB and a research scientist in CSAIL; and Tina Kapur, an assistant professor of radiology at MGB. The research will be presented at the International Conference on Learning Representations. Decoding uncertainty in words A radiologist writing a report about a chest X-ray might say the image shows a "possible" pneumonia, which is an infection that inflames the air sacs in the lungs. In that case, a doctor could order a follow-up CT scan to confirm the diagnosis. However, if the radiologist writes that the X-ray shows a "likely" pneumonia, the doctor might begin treatment immediately, such as by prescribing antibiotics, while still ordering additional tests to assess severity. Trying to measure the calibration, or reliability, of ambiguous natural language terms like "possibly" and "likely" presents many challenges, Wang says. Existing calibration methods typically rely on the confidence score provided by an AI model, which represents the model's estimated likelihood that its prediction is correct. For instance, a weather app might predict an 83 percent chance of rain tomorrow. That model is well-calibrated if, across all instances where it predicts an 83 percent chance of rain, it rains approximately 83 percent of the time. "But humans use natural language, and if we map these phrases to a single number, it is not an accurate description of the real world. If a person says an event is 'likely,' they aren't necessarily thinking of the exact probability, such as 75 percent," Wang says. Rather than trying to map certainty phrases to a single percentage, the researchers' approach treats them as probability distributions. A distribution describes the range of possible values and their likelihoods -- think of the classic bell curve in statistics. "This captures more nuances of what each word means," Wang adds. Assessing and improving calibration The researchers leveraged prior work that surveyed radiologists to obtain probability distributions that correspond to each diagnostic certainty phrase, ranging from "very likely" to "consistent with." For instance, since more radiologists believe the phrase "consistent with" means a pathology is present in a medical image, its probability distribution climbs sharply to a high peak, with most values clustered around the 90 to 100 percent range. In contrast the phrase "may represent" conveys greater uncertainty, leading to a broader, bell-shaped distribution centered around 50 percent. Typical methods evaluate calibration by comparing how well a model's predicted probability scores align with the actual number of positive results. The researchers' approach follows the same general framework but extends it to account for the fact that certainty phrases represent probability distributions rather than probabilities. To improve calibration, the researchers formulated and solved an optimization problem that adjusts how often certain phrases are used, to better align confidence with reality. They derived a calibration map that suggests certainty terms a radiologist should use to make the reports more accurate for a specific pathology. "Perhaps, for this dataset, if every time the radiologist said pneumonia was 'present,' they changed the phrase to 'likely present' instead, then they would become better calibrated," Wang explains. When the researchers used their framework to evaluate clinical reports, they found that radiologists were generally underconfident when diagnosing common conditions like atelectasis, but overconfident with more ambiguous conditions like infection. In addition, the researchers evaluated the reliability of language models using their method, providing a more nuanced representation of confidence than classical methods that rely on confidence scores. "A lot of times, these models use phrases like 'certainly.' But because they are so confident in their answers, it does not encourage people to verify the correctness of the statements themselves," Wang adds. In the future, the researchers plan to continue collaborating with clinicians in the hopes of improving diagnoses and treatment. They are working to expand their study to include data from abdominal CT scans. In addition, they are interested in studying how receptive radiologists are to calibration-improving suggestions and whether they can mentally adjust their use of certainty phrases effectively. "Expression of diagnostic certainty is a crucial aspect of the radiology report, as it influences significant management decisions. This study takes a novel approach to analyzing and calibrating how radiologists express diagnostic certainty in chest X-ray reports, offering feedback on term usage and associated outcomes," says Atul B. Shinagare, associate professor of radiology at Harvard Medical School, who was not involved with this work. "This approach has the potential to improve radiologists' accuracy and communication, which will help improve patient care." The work was funded, in part, by a Takeda Fellowship, the MIT-IBM Watson AI Lab, the MIT CSAIL Wistrom Program, and the MIT Jameel Clinic.
[2]
Assessment method may improve the reliability of radiologists' diagnostic reports
Due to the inherent ambiguity in medical images like X-rays, radiologists often use words like "may" or "likely" when describing the presence of a certain pathology, such as pneumonia. But do the words radiologists use to express their confidence level accurately reflect how often a particular pathology occurs in patients? A study shows that when radiologists express confidence about a certain pathology using a phrase like "very likely," they tend to be overconfident, and vice-versa when they express less confidence using a word like "possibly." The work is published on the arXiv preprint server. Using clinical data, a multidisciplinary team of MIT researchers, in collaboration with researchers and clinicians at hospitals affiliated with Harvard Medical School, created a framework to quantify how reliable radiologists are when they express certainty using natural language terms. The team used this approach to provide clear suggestions that help radiologists choose certainty phrases that would improve the reliability of their clinical reporting. They also showed that the same technique can effectively measure and improve the calibration of large language models by better aligning the words models use to express confidence with the accuracy of their predictions. By helping radiologists more accurately describe the likelihood of certain pathologies in medical images, this new framework could improve the reliability of critical clinical information. "The words radiologists use are important. They affect how doctors intervene, in terms of their decision making for the patient. If these practitioners can be more reliable in their reporting, patients will be the ultimate beneficiaries," says Peiqi Wang, an MIT graduate student and lead author of a paper on this research. The research will be presented at the International Conference on Learning Representations. Decoding uncertainty in words A radiologist writing a report about a chest X-ray might say the image shows a "possible" pneumonia, which is an infection that inflames the air sacs in the lungs. In that case, a doctor could order a follow-up CT scan to confirm the diagnosis. However, if the radiologist writes that the X-ray shows a "likely" pneumonia, the doctor might begin treatment immediately, such as by prescribing antibiotics, while still ordering additional tests to assess severity. Trying to measure the calibration, or reliability, of ambiguous natural language terms like "possibly" and "likely" presents many challenges, Wang says. Existing calibration methods typically rely on the confidence score provided by an AI model, which represents the model's estimated likelihood that its prediction is correct. For instance, a weather app might predict an 83% chance of rain tomorrow. That model is well-calibrated if -- across all instances where it predicts an 83% chance of rain -- it rains approximately 83% of the time. "But humans use natural language, and if we map these phrases to a single number, it is not an accurate description of the real world. If a person says an event is 'likely,' they aren't necessarily thinking of the exact probability, such as 75%," Wang says. Rather than trying to map certainty phrases to a single percentage, the researchers' approach treats them as probability distributions. A distribution describes the range of possible values and their likelihoods -- think of the classic bell curve in statistics. "This captures more nuances of what each word means," Wang adds. Assessing and improving calibration The researchers leveraged prior work that surveyed radiologists to obtain probability distributions that correspond to each diagnostic certainty phrase, ranging from "very likely" to "consistent with." For instance, since more radiologists believe the phrase "consistent with" means a pathology is present in a medical image, its probability distribution climbs sharply to a high peak, with most values clustered around the 90 to 100% range. In contrast, the phrase "may represent" conveys greater uncertainty, leading to a broader, bell-shaped distribution centered around 50%. Typical methods evaluate calibration by comparing how well a model's predicted probability scores align with the actual number of positive results. The researchers' approach follows the same general framework, but extends it to account for the fact that certainty phrases represent probability distributions rather than probabilities. To improve calibration, the researchers formulated and solved an optimization problem that adjusts how often certain phrases are used, to better align confidence with reality. They derived a calibration map that suggests certainty terms a radiologist should use to make the reports more accurate for a specific pathology. "Perhaps, for this dataset, if every time the radiologist said pneumonia was 'present,' they changed the phrase to 'likely present' instead, then they would become better calibrated," Wang explains. When the researchers used their framework to evaluate clinical reports, they found that radiologists were generally underconfident when diagnosing common conditions like atelectasis, but overconfident with more ambiguous conditions like infection. In addition, the researchers evaluated the reliability of language models using their method, providing a more nuanced representation of confidence than classical methods that rely on confidence scores. "A lot of times, these models use phrases like 'certainly.' But because they are so confident in their answers, it does not encourage people to verify the correctness of the statements themselves," Wang adds. In the future, the researchers plan to continue collaborating with clinicians in the hopes of improving diagnoses and treatment. They are working to expand their study to include data from abdominal CT scans. In addition, they are interested in studying how receptive radiologists are to calibration-improving suggestions and whether they can mentally adjust their use of certainty phrases effectively.
Share
Share
Copy Link
MIT researchers have created a framework to quantify and improve the reliability of radiologists' certainty phrases in diagnostic reports, potentially enhancing patient care and medical decision-making.
A team of researchers from MIT, in collaboration with clinicians from Harvard Medical School-affiliated hospitals, has developed a novel method to assess and enhance the reliability of radiologists' diagnostic reports. This groundbreaking approach aims to improve the accuracy of clinical information, potentially leading to better patient care and more informed medical decision-making 1.
Radiologists often use ambiguous language when describing pathologies in medical images due to the inherent uncertainty in interpretation. Words like "may" or "likely" are commonly employed to express varying levels of confidence. However, a new study reveals that radiologists tend to be overconfident when using phrases like "very likely" and underconfident with terms like "possibly" 2.
The researchers' framework treats certainty phrases as probability distributions rather than single percentages. This approach captures the nuances of natural language more accurately than traditional methods. For instance, the phrase "consistent with" is represented by a distribution that peaks sharply in the 90-100% range, while "may represent" has a broader, bell-shaped distribution centered around 50% 1.
To enhance the reliability of radiologists' reports, the team developed a calibration map that suggests more appropriate certainty terms for specific pathologies. This optimization process adjusts the frequency of certain phrases to better align confidence with reality. For example, changing "present" to "likely present" in some cases could improve overall calibration 2.
The words radiologists use significantly impact patient care and treatment decisions. A report indicating a "possible" pneumonia might lead to further testing, while a "likely" pneumonia could result in immediate treatment initiation. By improving the reliability of these reports, the new framework could enhance the quality of critical clinical information and ultimately benefit patients 1.
The researchers demonstrated that their technique could also be applied to large language models, providing a more nuanced representation of confidence than classical methods. This application could encourage users to verify the correctness of AI-generated statements, particularly when models express high confidence 2.
The research team plans to continue collaborating with clinicians to further improve diagnoses and treatment strategies. As the medical field increasingly relies on AI and machine learning, ensuring the accuracy and reliability of both human and machine-generated reports becomes crucial for advancing patient care and medical decision-making 1 2.
Reference
[1]
Massachusetts Institute of Technology
|New method assesses and improves the reliability of radiologists' diagnostic reports[2]
Medical Xpress - Medical and Health News
|Assessment method may improve the reliability of radiologists' diagnostic reportsMIT researchers have developed a method to improve AI model reliability in high-stakes settings like medical imaging, reducing prediction set sizes by up to 30% while maintaining accuracy.
3 Sources
3 Sources
A team from the University of Pennsylvania has introduced a novel AI training approach called Knowledge-enhanced Bottlenecks (KnoBo) that emulates the education pathway of human physicians for medical image analysis, potentially improving accuracy and interpretability in AI-assisted diagnostics.
2 Sources
2 Sources
Researchers develop BiomedParse, an AI model capable of analyzing nine types of medical images to predict systemic diseases, potentially revolutionizing medical diagnostics and improving efficiency for healthcare professionals.
2 Sources
2 Sources
Recent studies highlight the potential of artificial intelligence in medical settings, demonstrating improved diagnostic accuracy and decision-making. However, researchers caution about the need for careful implementation and human oversight.
2 Sources
2 Sources
Google introduces CT Foundation, a new AI tool for analyzing 3D CT scans, potentially revolutionizing medical imaging and diagnosis. This development highlights the growing role of AI in healthcare, particularly in radiology.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved