4 Sources
4 Sources
[1]
These medical X-rays are all deepfakes -- and they fool even radiologists
Most radiologists struggle to identify X‑ray scans that are generated by artificial intelligence, with fewer than half spotting synthetic images hidden in real medical data, according to research published today in Radiology. Large language models (LLMs) also had a hard time picking out real versus synthetic medical images. The study provides training to help radiologists to improve their skills in detecting AI-generated X-rays. Researchers also warn that the negative impacts of synthetic data could creep into scientific literature and medical litigation. "The results from this study are both disturbing and not very surprising to me," says Elisabeth Bik, a microbiologist and image-integrity specialist based in San Francisco, California. "This raises concerns not only for research integrity, but also for clinical workflows, insurance claims and legal contexts where imaging evidence is used." In the study, 17 radiologists from 12 research centres were presented with X-ray scans -- half were real scans, and half were generated by AI. Without knowing the purpose of the study, participants were asked about the technical quality of the AI images and whether they noticed anything unusual. 41% raised concerns that AI scans might have infiltrated the data set. The radiologists were then informed that some of the images were AI-generated and asked to discern real scans from those created by ChatGPT. The participants correctly identified the AI and real scans 75% of the time, on average. Importantly, "there was no difference based on the experience of the radiologists", who had between zero and 40 years of professional experience, says study co-author Mickael Tordjman, a radiologist at the Icahn School of Medicine at Mount Sinai in New York. The research team also investigated whether AI models such as ChatGPT and Gemini might have a more discerning eye than the radiologists, but they were only 57-85% accurate when teasing apart the real and ChatGPT-generated images. That AI‑generated images are so hard to detect might pose significant challenges for the scientific community. Some of the most obvious are that litigations and insurance claims might be compromised by fake images, Tordjman says. There's also the risk that synthetic scans will infiltrate the scientific literature, he adds. Peer reviewers and journals have been inundated with papers containing AI-generated images and medical data, despite policies from publishers that ban or limit their use. And researchers worry that AI-generated content containing inaccuracies is misleading both for scientists and the public. Another concern is that the AI models being used to read medical imaging data could become distorted by AI-generated data in their training data sets. It could cause the models to "latch onto features that are not exactly relevant to real medical cases, but are purely artefacts of generative AI models", says Siwei Lyu, who researches media forensics at the University at Buffalo in New York. At the same time, researchers have found that synthetic data improve the performance of radiographic AI models. More lifelike AI radiographs could ultimately be good news for the medical community if they fill in gaps in AI models' training data, says Curtis Langlotz, a radiologist at Stanford University in California. The study authors have created an interactive quiz that aims to teach researchers how to discern between AI-generated and real X-ray scans. The quiz covers tell-tale characteristics of AI-generated radiographs, such as overly smooth bones and unnaturally straight spines. As well as training radiologists, the authors note, techniques such as digital watermarking could make it easier to identify real images, as could regulatory frameworks from governmental agencies.
[2]
Deepfake X-rays are so real even doctors can't tell the difference
A new study published on March 24 in Radiology, the journal of the Radiological Society of North America (RSNA), shows that both radiologists and multimodal large language models (LLMs) have difficulty telling real X-rays apart from artificial intelligence (AI)-generated "deepfake" images. The findings raise concerns about the risks posed by synthetic medical images and highlight the need for better tools and training to help protect the accuracy of medical imaging and prepare health care professionals to recognize deepfakes. A "deepfake" is a video, photo, image or audio file that appears authentic but has been created or altered using AI. "Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present," said lead study author Mickael Tordjman, M.D., post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York. "This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital's network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record." Study Details and Image Testing The research included 17 radiologists from 12 institutions across six countries (United States, France, Germany, Turkey, United Kingdom and United Arab Emirates). Their experience ranged from beginners to specialists with up to 40 years in practice. In total, the study examined 264 X-ray images, split evenly between real scans and AI-generated ones. Participants reviewed two separate image sets with no overlap. One set contained a mix of real images and ChatGPT-generated X-rays from different parts of the body. The second set focused on chest X-rays, with half real and half created using RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers. Detection Accuracy for Radiologists and AI When radiologists were not told that fake images were included, only 41% recognized the AI-generated X-rays after evaluating their technical quality. Once they were informed that synthetic images were present, their average accuracy in distinguishing real from fake rose to 75%. Performance varied widely among individuals. Radiologists correctly identified between 58% and 92% of the ChatGPT-generated images. AI systems showed similar limitations. Four multimodal LLMs -- GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta) -- achieved accuracy rates ranging from 57% to 85%. Even ChatGPT-4o, which was used to generate the deepfake images, did not detect all of them, though it performed better than the other models. For the RoentGen-generated chest X-rays, radiologists achieved accuracy rates between 62% and 78%, while the AI models ranged from 52% to 89%. Experience Does Not Guarantee Detection The study found no link between a radiologist's years of experience and their ability to identify fake X-rays. However, musculoskeletal radiologists performed significantly better than other subspecialists. Visual Clues in Deepfake X-Rays Researchers identified several patterns that can appear in synthetic images. "Deepfake medical images often look too perfect," Dr. Tordjman said. "Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone." Risks and Safeguards for Medical Imaging The results highlight serious risks if deepfake X-rays are misused. Fabricated images could be used in legal cases or inserted into hospital systems to influence diagnoses and disrupt care. To reduce these threats, researchers recommend stronger digital protections. These include invisible watermarks embedded directly into images and cryptographic signatures linked to the technologist at the time of image capture, which can help verify authenticity. The Future of AI in Medical Imaging "We are potentially only seeing the tip of the iceberg," Dr. Tordjman said. "The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical." To support education and awareness, the researchers have released a curated deepfake dataset that includes interactive quizzes for training purposes.
[3]
Deepfake X-rays can deceive radiologists and AI systems
Radiological Society of North AmericaMar 24 2026 Neither radiologists nor multimodal large language models (LLMs) are able to easily distinguish artificial intelligence (AI)-generated "deepfake" X-ray images from authentic ones, according to a study published today in Radiology, a journal of the Radiological Society of North America (RSNA). The findings highlight the potential risks associated with AI-generated X-ray images, along with the need for tools and training to protect the integrity of medical images and prepare health care professionals to detect deepfakes. The term "deepfake" refers to a video, photo, image or audio recording that appears real but has been created or manipulated using AI. Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present. This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital's network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record." Mickael Tordjman, M.D., lead study author, post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York Seventeen radiologists from 12 different centers in six countries (United States, France, Germany, Turkey, United Kingdom and United Arab Emirates) participated in the retrospective study. Their professional experience ranged from 0 to 40 years. Half of the 264 X-ray images in the study were authentic, and the other half were generated by AI. Radiologists were evaluated on two distinct image sets, with no overlapping between the datasets. The first dataset included real and ChatGPT-generated images of multiple anatomical regions. The second dataset included chest X-ray images-half authentic and the other half created by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers. When radiologist readers were unaware of the study's true purpose, yet asked after ranking the technical quality of each ChatGPT image if they noticed anything unusual, only 41% spontaneously identified AI-generated images. After being informed that the dataset contained synthetic images, the radiologists' mean accuracy in differentiating the real and synthetic X-rays was 75%. Individual radiologist performance in accurately detecting the ChatGPT-generated images ranged from 58% to 92%. Similarly, the accuracy of four multimodal LLMs-GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)-ranged from 57% to 85%. Even ChatGPT-4o, the model used to create the deepfakes, was unable to accurately detect all of them, though it identified the most by a considerable margin compared to Google and Meta LLMs. Radiologist accuracy in detecting the RoentGen synthetic chest X-Rays ranged from 62% to 78% and the LLM models' performance ranged from 52% to 89%. There was no correlation between a radiologist's years of experience and their accuracy in detecting synthetic X-ray images. However, musculoskeletal radiologists demonstrated significantly higher accuracy than other radiology subspecialists. The study identified common features of synthetic X-rays. "Deepfake medical images often look too perfect," Dr. Tordjman said. "Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone." Recommended solutions to clearly distinguish real and fake images and help prevent tampering include implementing advanced digital safeguards, such as invisible watermarks that embed ownership or identity data directly into the images and automatically attaching technologist-linked cryptographic signatures when the images are captured. "We are potentially only seeing the tip of the iceberg," Dr. Tordjman said. "The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical." The study's authors have published a curated deepfake dataset with interactive quizzes for educational purposes. Radiological Society of North America Journal reference: Tordjman, M., et al. (2026). The Rise of Deepfake Medical Imaging: Radiologists' Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs. Radiology. DOI: 10.1148/radiol.252094. https://pubs.rsna.org/doi/10.1148/radiol.252094
[4]
AI-generated medical scans prove nearly indistinguishable from reality
AI-generated X-rays can now appear convincing enough to mislead even expert radiologists, according to a study published March 24 in Radiology, the journal of the Radiological Society of North America. The research found that radiologists were able to correctly identify whether X-ray images were real or synthetic only 75% of the time on average, despite knowing that fake images were included in the dataset. The study involved 17 radiologists from 12 research centers across six countries: the United States, France, Germany, Turkey, the United Kingdom, and the United Arab Emirates. Participants reviewed 264 X-ray images, split evenly between authentic scans and AI-generated ones. Before learning the study's true aim, only 41% of them independently suspected that some of the images may have been produced by artificial intelligence. Lead author Mickael Tordjman, a radiologist at the Icahn School of Medicine at Mount Sinai in New York, said the findings show how far synthetic medical imaging has advanced. "Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present," he said. Performance varied widely among the radiologists, with individual scores ranging from 58% to 92%. The researchers found no meaningful relationship between detection ability and professional experience, which ranged from no experience to 40 years. Musculoskeletal radiologists performed better than other subspecialists. The team also tested four multimodal large language models, including GPT-4o and Gemini 2.5 Pro, which achieved accuracy rates between 57% and 85% when assessing ChatGPT-generated images. The implications extend well beyond image interpretation. Tordjman said the results point to a serious vulnerability in areas such as legal disputes and hospital cybersecurity, including scenarios in which fake fractures could be used in fraudulent claims or synthetic scans could be inserted into medical systems to influence diagnoses. Outside experts also raised concerns. Elisabeth Bik, a microbiologist and specialist in image integrity, told Nature that the findings were "both disturbing and not very surprising," and said the risks affect research integrity, clinical workflows, insurance claims, and legal proceedings that rely on imaging evidence. Tordjman noted that some synthetic images still contain subtle warning signs, including bones that appear overly smooth, spines that look unnaturally straight, and blood vessel patterns that seem too uniform. Still, the study suggests that visual inspection alone is no longer a reliable safeguard. The authors called for countermeasures such as invisible watermarking and cryptographic signatures applied at the moment images are captured. The researchers also warned that the current problem may be only an early stage of a broader challenge. "We are potentially only seeing the tip of the iceberg," Tordjman said, pointing to AI-generated 3D scans such as CT and MRI as a likely next step. He said building training datasets and detection tools now will be essential before those threats become even harder to manage.
Share
Share
Copy Link
A new study published in Radiology reveals that radiologists can correctly distinguish real from AI-generated medical X-rays only 75% of the time, even when aware synthetic images are present. The research, involving 17 radiologists from six countries, found that 41% initially suspected deepfake infiltration. Large language models like ChatGPT and Gemini performed similarly poorly, achieving 57-85% accuracy, highlighting urgent concerns about fraudulent litigation, hospital cybersecurity, and research integrity.

Radiologists struggle to identify AI-generated medical X-rays, correctly distinguishing between real and AI-generated X-rays only 75% of the time on average, according to a study published March 24 in Radiology, the journal of the Radiological Society of North America
1
2
. The research involved 17 radiologists from 12 research centers across six countries—the United States, France, Germany, Turkey, the United Kingdom, and the United Arab Emirates—who reviewed 264 X-ray images split evenly between authentic scans and deepfake X-rays3
. When participants were initially unaware of the study's purpose and asked about technical quality, only 41% spontaneously raised concerns that AI-generated medical images might have infiltrated the dataset2
.Lead study author Mickael Tordjman, a radiologist at the Icahn School of Medicine at Mount Sinai in New York, emphasized the severity of these findings. "Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present," he stated
3
. Individual performance varied widely, with radiologists correctly identifying ChatGPT-generated images between 58% and 92% of the time2
. Importantly, professional experience ranging from zero to 40 years showed no correlation with detection accuracy, though musculoskeletal radiologists demonstrated significantly higher accuracy than other subspecialists3
.The research team tested whether large language models (LLMs) might perform better than human experts at identifying synthetic medical images. Four multimodal LLMs—GPT-4o, GPT-5 from OpenAI, Gemini 2.5 Pro from Google, and Llama 4 Maverick from Meta—achieved accuracy rates ranging from 57% to 85% when evaluating ChatGPT-generated images
2
4
. Notably, even ChatGPT-4o, the model used to create the deepfake images, could not accurately detect all of them, though it performed better than competing models3
. For RoentGen-generated chest X-rays—created using an open-source generative AI diffusion model developed by Stanford Medicine researchers—radiologists achieved accuracy rates between 62% and 78%, while AI models ranged from 52% to 89%2
.The implications extend far beyond diagnostic accuracy. Tordjman warned of high-stakes vulnerabilities for fraudulent litigation, where fabricated fractures could be indistinguishable from real ones
2
. Hospital cybersecurity threats represent another critical concern—if hackers gain access to a hospital's network, they could inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of digital medical records3
. Elisabeth Bik, a microbiologist and image-integrity specialist, told Nature that the results were "both disturbing and not very surprising," raising concerns for research integrity, clinical workflows, insurance claims, and legal contexts where imaging evidence is used1
4
.Another emerging threat involves training data contamination. Siwei Lyu, who researches media forensics at the University at Buffalo, warned that diagnostic AI models could become distorted if AI-generated data infiltrates their training datasets, causing models to "latch onto features that are not exactly relevant to real medical cases, but are purely artefacts of generative AI models"
1
.Related Stories
Researchers identified several visual clues that can appear in synthetic images. "Deepfake medical images often look too perfect," Tordjman explained
2
. Common characteristics include bones that are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures that appear unusually clean and consistent, often limited to one side of the bone3
. However, the study suggests that visual inspection alone is no longer a reliable safeguard4
.To combat these threats, researchers recommend implementing advanced digital safeguards. These include invisible watermarks that embed ownership or identity data directly into images and cryptographic signatures automatically attached and linked to the technologist at the time of image capture
3
4
. The study authors have created an interactive quiz designed to teach researchers how to discern between AI-generated and real X-ray scans, along with a curated deepfake dataset for educational purposes1
2
.Tordjman cautioned that current challenges may represent only the beginning. "We are potentially only seeing the tip of the iceberg," he said, pointing to AI-generation of synthetic 3D medical images such as CT and MRI as the logical next step
2
4
. Establishing educational datasets and detection tools now is critical before these threats become even harder to manage, he emphasized.Summarized by
Navi
[3]
07 May 2025•Science and Research

06 Jun 2025•Health

27 Dec 2025•Science and Research

1
Policy and Regulation

2
Policy and Regulation

3
Technology
