The Day AI Beat Doctors at Emergency Triage (And The Real Story Is Even Weirder)

The Day AI Beat Doctors at Emergency Triage (And The Real Story Is Even Weirder)

A Harvard study published in Science on April 30 just went mainstream, but the treatment planning results might be even more consequential than the headline.

The study pitted OpenAI’s o1 reasoning model against 323 internal medicine doctors at Beth Israel Deaconess Medical Center in Boston, using 76 real ER cases. At the triage stage — when doctors have limited information and need to make rapid decisions — the AI identified the correct or very-close diagnosis in 67% of cases, compared to just 50–55% for human doctors.

That headline alone is big news. But here’s where it gets weirder: when the AI was asked to plan treatment, it scored 89%, while the human doctors scored 34%. Yes, thirty-four percent.

How the Study Worked

Led by Arjun Manrai (who heads an AI lab at Harvard Medical School) and Dr. Adam Rodman (clinical researcher at Beth Israel), the team tested the model at three moments in the patient journey: triage, admission, and hospital care. All data was drawn from actual emergency department cases — no artificial test scenarios.

The AI wasn’t doing physical examinations or talking to patients. It was working from electronic health records and written clinical information — just like a remote doctor trying to diagnose via phone. And with just that limited data, it still came out ahead.

The Numbers Tell a Story

Here’s the full breakdown from the study, as reported by both The Guardian and NPR:

  • Triage stage (limited info): AI 67% vs. doctors 50–55%
  • Admission (more detail available): AI 82% vs. experts 70–79%
  • Treatment planning: AI 89% vs. doctors 34% — where the AI’s advantage was “particularly pronounced,” says Dr. Rodman

One of the real-world cases that got the team’s attention involved a patient who had been taking medication that wasn’t working. The AI correctly identified lupus as the underlying cause. The human doctors using conventional resources (search engines, medical references) did not.

What This Actually Means

Arjun Manrai’s take, quoted in the Guardian: “I don’t think our findings mean that AI replaces doctors… I think it does mean that we’re witnessing a really profound change in technology that will reshape medicine.”

Dr. Rodman framed it as a “triadic care model” — the doctor, the patient, and an AI system working together. That’s the more likely near-term future than “AI replaces doctors.”

Dr. David Reich of Mount Sinai, who was not involved in the study, summed it up pragmatically: “You have something which is quite accurate, possibly ready for prime time. Now the open question is how the heck do you introduce it into clinical workflows in ways that actually improve care?”

The Irony

Nearly 1 in 5 US physicians already use AI to assist with diagnosis, according to the study authors. In the UK, 16% of doctors use AI daily and another 15% use it weekly. Yet these same doctors still got beaten — in real ER cases, with actual patient data — by the same technology.

The treatment planning gap is the part that got me. 89% vs 34% isn’t a marginal improvement — it’s the difference between a competent triage tool and something that fundamentally changes how emergency medicine works.

Sources