health

May 3, 2026

In Harvard study, AI offered more accurate diagnoses than emergency room doctors

A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.

In Harvard study, AI offered more accurate diagnoses than emergency room doctors

TL;DR

  • A new study compared the diagnostic accuracy of OpenAI's large language models (o1 and 4o) with human physicians in emergency room cases.
  • The o1 model performed on par with or better than two attending physicians in diagnosing 76 emergency room patients.
  • In initial ER triage, the o1 model achieved exact or close diagnoses in 67% of cases, compared to 55% and 50% for the two physicians.
  • Researchers did not pre-process the data, providing AI models with the same information available in electronic medical records.
  • The study highlights an urgent need for prospective trials to evaluate AI in real-world patient care, while acknowledging limitations with non-textual inputs.
  • Concerns were raised about the lack of a formal accountability framework for AI diagnoses and patient preference for human guidance in critical decisions.

Continue reading the original article