Stanford's Merlin AI delivers consistent diagnoses across hospitals using 3D CT scans

2 Sources

Share

Stanford researchers developed Merlin AI, a vision-language model that analyzes 3D abdominal CT scans with 81% accuracy in predicting diagnoses. Trained on over 15,000 scans and nearly one million diagnostic codes, the NIH-funded tool outperformed specialized models across 750 tasks and demonstrated consistent performance across multiple hospital sites, offering a potential solution to the growing radiologist shortage.

Merlin AI Tackles Radiologist Shortage with Consistent Performance

Stanford University researchers have developed Merlin AI, a vision-language AI model designed to analyze 3D abdominal CT scans with remarkable diagnostic consistency across different healthcare settings

1

. The NIH-funded machine learning tool addresses a critical need in radiology, where roughly 30 million abdominal CT scans are performed annually in the United States amid a growing physician shortage

1

. This imbalance has increased radiologists' workload and the risk of suboptimal care, driving development of radiology AI solutions that now account for approximately 75% of all AI-enabled medical devices approved by the US Food and Drug Administration

1

.

Source: News-Medical

Source: News-Medical

Foundation Model Trained on Unprecedented Clinical Dataset

Merlin represents a new class of foundation model trained on the largest collection of abdominal CT data to date

2

. The team at Stanford University School of Medicine trained the model on more than 15,000 3D abdominal CT scans paired with radiology reports and nearly one million diagnostic codes from electronic health records

2

. This multimodal pre-training approach allowed Merlin to learn relationships between visual imaging data and written clinical information. The researchers have made their vision-language data set of 25,494 CT scans and radiology reports open-source to accelerate further research

1

.

Exceptional Zero-Shot Performance Across Multiple Hospital Sites

To test Merlin's generalizability, researchers assessed its accuracy at classifying 30 abdominal abnormalities using 37,855 CT scans obtained from three external hospital sites without fine-tuning the model—a capability known as zero-shot performance

1

. Across all sites, Merlin consistently outperformed the second-best alternative architecture baseline, achieving a 19.7% average improvement in classification accuracy

1

. At individual hospital sites, Merlin outperformed the next-best vision-language model by 34.4%, 15.7%, and 8.9%, demonstrating robustness to variations in patient demographics, imaging protocols, and radiologists' reporting practices

1

.

Source: Nature

Source: Nature

Merlin AI Excels at Diverse Diagnostic and Prognostic Tasks

The researchers tested Merlin across six broad categories spanning more than 750 individual tasks that included diagnostics, prognostics, and quality assessment

2

. On average across 692 different diagnostic codes, Merlin successfully predicted which of two scans was more likely to be associated with a particular code over 81% of the time, outperforming several specialized models

2

. For a subset of 102 codes, performance rose to 90%

2

. Co-first author Louis Blankemeier, Ph.D., explained that "with Merlin, you could potentially go beyond traditional radiology and jump straight from imaging to a possible diagnosis"

2

.

Early Disease Diagnosis and Biomarker Discovery Potential

Merlin demonstrated capability in predicting the onset of chronic diseases such as diabetes, osteoporosis, and heart disease in healthy patients based solely on CT scans

2

. When comparing scans from different subjects, Merlin could identify patients at higher risk of developing a particular disease in the next five years 75% of the time, versus 68% for competing models

2

. These findings suggest the model can detect key features in scans that may be invisible to human eyes, potentially helping identify new biomarkers for disease classification

2

.

Path Toward Clinical Adoption and Future Applications

Bruce Tromberg, Ph.D., Director of NIH's National Institute of Biomedical Imaging and Bioengineering, noted that "rich datasets like this are necessary to push the limits of what artificial intelligence models can accomplish in medicine"

2

. The researchers attribute Merlin's strong performance to its full 3D image encoder and large-scale multimodal training approach

1

. Co-first author Ashwin Kumar and senior author Akshay S. Chaudhari acknowledge that clinical deployment will require prospective validation and evaluation on additional tasks such as radiology-report generation across external data distributions

1

. The tool's ability to automate complex clinical diagnostics could significantly streamline clinical workflows, though its performance on chest CT scans—despite having no chest imaging in its training data—suggests even broader applications may be possible

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo