In Harvard study, AI offered more accurate diagnoses than emergency room doctors

May 3, 2026

TL;DR

A new study compared the diagnostic accuracy of OpenAI's large language models (o1 and 4o) with human physicians in emergency room cases.
The o1 model performed on par with or better than two attending physicians in diagnosing 76 emergency room patients.
In initial ER triage, the o1 model achieved exact or close diagnoses in 67% of cases, compared to 55% and 50% for the two physicians.
Researchers did not pre-process the data, providing AI models with the same information available in electronic medical records.
The study highlights an urgent need for prospective trials to evaluate AI in real-world patient care, while acknowledging limitations with non-textual inputs.
Concerns were raised about the lack of a formal accountability framework for AI diagnoses and patient preference for human guidance in critical decisions.

Continue reading the original article