Deepfake X-rays fool radiologists 75% of the time, raising concerns about medical fraud

Reviewed byNidhi Govil

3 Sources

Share

A new study reveals that radiologists correctly identify AI-generated X-rays only 75% of the time, even when aware synthetic images are present. The research involved 17 radiologists from six countries and tested both ChatGPT and RoentGen-generated scans. Experts warn of serious implications for fraudulent litigation, hospital cybersecurity, and research integrity as synthetic medical imaging becomes indistinguishable from reality.

Deepfake X-rays Deceive Expert Radiologists at Alarming Rates

Radiologists struggle to distinguish deepfake X-rays from authentic medical scans, achieving only 75% accuracy on average even when explicitly told that artificial intelligence-generated images were mixed into their datasets, according to research published in Radiology by the Radiological Society of North America

1

2

. The study involved 17 radiologists from 12 research centers across six countries—the United States, France, Germany, Turkey, the United Kingdom, and the United Arab Emirates—who reviewed 264 X-ray images split evenly between real scans and AI-generated medical scans

3

. Before learning the study's true purpose, only 41% of participants spontaneously suspected that synthetic medical imaging had infiltrated the dataset

2

.

Source: Nature

Source: Nature

Lead author Mickael Tordjman, a radiologist at the Icahn School of Medicine at Mount Sinai in New York, emphasized the gravity of these findings: "Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present"

2

. Individual performance ranged dramatically from 58% to 92% accuracy, with no correlation between years of professional experience—which spanned from zero to 40 years—and detection ability

1

3

. Musculoskeletal radiologists demonstrated significantly higher accuracy than other subspecialists

2

.

Large Language Models Also Struggle with Detection

The research team tested whether large language models might outperform human experts in identifying synthetic images. Four multimodal systems—GPT-4o and GPT-5 from OpenAI, Gemini 2.5 Pro from Google, and Llama 4 Maverick from Meta—were evaluated on the same task

2

. Their accuracy ranged from 57% to 85% when assessing ChatGPT-generated images, and from 52% to 89% when evaluating RoentGen synthetic chest X-rays

2

. Notably, even ChatGPT-4o, the model used to create some of the deepfakes, could not accurately detect all of them, though it performed considerably better than competing systems

2

.

Risks of Deepfake Medical Images Span Multiple Domains

The implications extend far beyond diagnostic accuracy. Tordjman warned of "high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one"

2

. Hospital cybersecurity faces new threats if hackers gain access to medical networks and inject synthetic images to manipulate patient diagnoses or undermine the reliability of digital medical records

2

3

. Elisabeth Bik, a microbiologist and image-integrity specialist, described the findings as "both disturbing and not very surprising," noting concerns for research integrity, clinical workflows, insurance claims, and legal contexts where imaging evidence is used

1

3

.

Source: News-Medical

Source: News-Medical

Another concern involves AI models trained on corrupted datasets. Siwei Lyu, who researches media forensics at the University at Buffalo, explained that models could "latch onto features that are not exactly relevant to real medical cases, but are purely artefacts of generative AI models"

1

. Peer reviewers and journals have already been inundated with papers containing AI-generated images despite policies that ban or limit their use

1

.

Detecting Unnaturally Perfect Anatomical Features

The study identified telltale characteristics of synthetic images that can help with detection. "Deepfake medical images often look too perfect," Tordjman explained. "Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone"

2

3

. The study authors created an interactive quiz covering these characteristics to train researchers on discerning between AI-generated and real X-ray scans

1

2

.

Digital Watermarking and Cryptographic Signatures as Solutions

To protect medical imaging integrity, researchers recommend implementing advanced digital safeguards. Digital watermarking can embed ownership or identity data directly into images, while cryptographic signatures linked to technologists can be automatically attached when images are captured

1

2

3

. Regulatory frameworks from governmental agencies could also help establish standards for identifying authentic medical images

1

.

Tordjman cautioned that current challenges may represent only the beginning: "We are potentially only seeing the tip of the iceberg. The logical next step in this evolution is AI-generation of synthetic 3D scans (CT and MRI). Establishing educational datasets and detection tools now is critical"

2

3

. As synthetic medical imaging technology advances, the medical community must prepare for increasingly sophisticated deepfakes that could compromise patient care, legal proceedings, and scientific research if left unchecked.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo