health
May 3, 2026
In Harvard study, AI offered more accurate diagnoses than emergency room doctors
A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.

TL;DR
- A new study compared the diagnostic accuracy of OpenAI's large language models (o1 and 4o) with human physicians in emergency room cases.
- The o1 model performed on par with or better than two attending physicians in diagnosing 76 emergency room patients.
- In initial ER triage, the o1 model achieved exact or close diagnoses in 67% of cases, compared to 55% and 50% for the two physicians.
- Researchers did not pre-process the data, providing AI models with the same information available in electronic medical records.
- The study highlights an urgent need for prospective trials to evaluate AI in real-world patient care, while acknowledging limitations with non-textual inputs.
- Concerns were raised about the lack of a formal accountability framework for AI diagnoses and patient preference for human guidance in critical decisions.
Continue reading the original article