AI vs Doctors: AI Diagnosis Tops ER Physicians 67%

AI Outperforms ER Doctors in Real-World Emergency Department Cases

A groundbreaking study from Harvard Medical School and Beth Israel Deaconess Medical Center demonstrates that AI diagnosis capabilities now match or exceed physician performance in emergency medicine settings. Published in Science, the research tested OpenAI's o1 and 4o models against human doctors using 76 actual emergency room cases, marking a shift from theoretical assessments to authentic clinical evaluation1

Source: Inc.

The results show the o1 model achieved the exact or very close diagnosis in 67% of triage cases, while two attending physicians scored 55% and 50% respectively1

. This performance gap proved most pronounced at initial triage, the critical moment when information is scarcest and decisions carry the highest urgency. Blinded reviewers assessing the diagnoses could not distinguish between AI-generated and human recommendations3

Large Language Model Diagnostic Accuracy Surpasses Previous Benchmarks

The research team conducted six experiments to measure clinical diagnostic reasoning across multiple scenarios. When tested on published clinicopathological conference cases, the o1-preview model achieved exact or very-close diagnostic accuracy in 88.6% of cases, substantially outperforming GPT-4 which scored 72.9%2

. This advancement in large language model diagnostic accuracy represents a significant leap from earlier AI systems that primarily demonstrated proficiency on medical licensing examinations rather than real-world patient care.

Arjun Manrai, who heads an AI lab at Harvard Medical School and serves as one of the study's lead authors, stated: "We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines"1

. The researchers emphasized they did not pre-process patient data, presenting the AI with the same information available in electronic medical records at each diagnostic touchpoint1

AI in the Emergency Department Demonstrates Strength in Uncertainty

The study found AI as a diagnostic tool handled uncertainty far more effectively than human clinicians, particularly when working with fragmented or unstructured health data and notes3

. Adam Rodman, a Beth Israel doctor and lead author, described a case where a patient with routine respiratory symptoms who had recently undergone organ transplant turned out to have a dangerous flesh-eating infection. "The model actually was suspicious of this [infection] from the very beginning, probably 12 to 24 hours before the human physician would have become suspicious"4

Both human clinicians and AI improved as more patient data became available, but the model's advantage at early stages suggests potential for avoiding missed diagnoses with AI support3

. This capability addresses one of emergency medicine's most challenging aspects: thinking of the correct diagnosis when information is limited and time is critical4

Source: Earth.com

Collaborative Care Models Emerge as Preferred Implementation Path

Despite the impressive results showing AI outperforms ER doctors in specific contexts, researchers stress the technology should augment rather than replace physicians. "I don't think our findings mean that AI replaces doctors, despite what some companies are likely to say, and how they're likely to use these results," Manrai said during a press briefing3

. Rodman told the Guardian that patients "want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions"1

Previous research on collaborative care models found no substantial difference between physicians augmented with GPT-4 and the GPT-4 model working alone, though both outperformed physicians with conventional resources2

. This suggests determining optimal implementation will require evaluating AI alone, clinician alone, and clinician with AI configurations2

Accountability and Clinical Trials Needed Before Widespread Deployment

The study identifies an urgent need for prospective clinical trials to evaluate AI in healthcare within real-world patient care settings1

. Currently, "there's no formal framework right now for accountability" around AI diagnoses, according to Rodman1

. Researchers at Flinders University wrote in a Science commentary that "we do not allow doctors to practice without supervision and evaluation, and AI should be held to comparable standards"3

The research carries important limitations. The models only processed text-based information, while actual emergency medicine relies heavily on visual and auditory cues from physical examinations1

. The AI never saw patients, examined them, spoke to families, or took responsibility for outcomes5

. Future assessments must evaluate multimodal AI capabilities that process images, audio, and video alongside text2

Source: CNET

Real-World Adoption Outpaces Regulatory Frameworks

The urgency for establishing safety and equity standards intensifies as AI adoption accelerates. A Royal College of Physicians survey found 16% of UK doctors use AI tools in clinical practice daily, with another 15% using them weekly5

. Globally, 1 in 5 doctors and nurses used AI for second opinions on complex cases as of 2025, with over half wanting to use it for this purpose4

Doctors are integrating these tools into practice, sometimes without institutional oversight, before hospitals have established protocols for assessment, staff training, harm detection, or decision support accountability2

. The gap between producing possible diagnoses and actually improving patient outcomes remains unclear, as longer diagnostic lists could generate unnecessary tests, over-treatment, or unwarranted confidence in plausible but incorrect answers5

. Regulators, hospitals, and healthcare providers must collaborate to test these tools thoroughly, ensuring they deliver care that is better, safer, and faster for all patients3

AI Diagnosis Outperforms Emergency Room Doctors in Harvard Study, Raises Questions on Patient Care

AI Outperforms ER Doctors in Real-World Emergency Department Cases

Large Language Model Diagnostic Accuracy Surpasses Previous Benchmarks

AI in the Emergency Department Demonstrates Strength in Uncertainty

Collaborative Care Models Emerge as Preferred Implementation Path

Accountability and Clinical Trials Needed Before Widespread Deployment

Real-World Adoption Outpaces Regulatory Frameworks

References

In Harvard study, AI offered more accurate diagnoses than emergency room doctors | TechCrunch

AI can reason like a physician -- what comes next?

AI Outperforms ER Doctors in Diagnostic Cases, Study Points to Collaborative Care

Can AI help doctors avoid missed diagnoses? A new study suggests yes

AI in the emergency department: promising, powerful but still unproven

Related Stories

AI vs. Human Expertise: Study Reveals Limitations of AI in Emergency Triage

AI Outperforms Physicians in Virtual Urgent Care Study, Highlighting Potential for Improved Patient Care

AI in Healthcare: Patients Trust AI Medical Advice Over Doctors, Raising Concerns and Challenges

Recent Highlights

Google Search gets its biggest AI overhaul in 25 years with agentic AI and intelligent search box

Google bets on AI agents with Gemini 3.5 Flash, Spark, and Omni at I/O 2026

Google Expands SynthID AI Detection to Chrome and Search With OpenAI and Nvidia Support

Recent Highlights

Today's Top Stories

OpenAI claims its AI model solved an 80-year-old math problem posed by Paul Erdős

SpaceX files for IPO with $28.5 trillion market target as Elon Musk bets big on AI in space

Google smart glasses powered by Gemini AI challenge Meta and Apple with hands-free capabilities

Canva and Adobe integrate with Google Gemini, making AI design tools ubiquitous across platforms