AI outperforms doctors on emergency-room diagnoses in Harvard study
A new Harvard study evaluated large language models on real emergency room cases and found that at least one model produced more accurate diagnoses than two experienced ER physicians. The result represents a promising step toward AI systems that can help clinicians make faster, more accurate decisions in high-pressure, time-sensitive environments.
The researchers compared model outputs to clinician diagnoses across a range of acute presentations. In multiple cases the best-performing model listed the correct diagnosis higher or more consistently than the physicians it was measured against. While the study focuses on research-grade evaluation rather than deployment, the findings demonstrate that modern LLMs can capture and synthesize clinical information effectively when presented with emergency case data.
Practical implications are immediate: AI can act as a decision-support partner to reduce missed diagnoses, accelerate triage, and surface less obvious differential diagnoses that clinicians can review. Potential use cases include:
- Real-time diagnostic suggestions during triage or initial assessment
- Automated summarization of patient history and test results to highlight key risks
- Second-opinion prompts that expand differential diagnoses for complex or atypical presentations
Researchers emphasize that AI is not a replacement for physicians but a complementary tool. The study authors call for further prospective trials, integration testing in live workflows, and robust safeguards—such as human oversight, explainability features, and monitoring—to ensure safety and equity. Still, this result is a notable win for AI in healthcare: it shows measurable, real-world diagnostic value that could improve outcomes in emergency medicine as systems are validated and responsibly deployed.