A study published June 12, 2026, in Nature Medicine found that general-purpose large language models consistently outperformed dedicated clinical AI products across standardized medical tasks. The general-purpose models were also preferred by the clinicians using them.
What the study actually tested
The researchers pitted three major general-purpose LLMs against purpose-built medical tools. On one side: OpenAI’s GPT-5.2, Google’s Gemini 3.1 Pro Preview, and Anthropic’s Claude Opus 4.6. On the other: dedicated clinical products like OpenEvidence and UpToDate Expert AI, tools specifically designed and marketed for healthcare professionals.
The battleground included MedQA questions, a well-established benchmark for evaluating medical knowledge drawn from medical licensing exams. The general-purpose models excelled across these tasks, beating the specialists on their home turf.
Google Search AI Overview was included as a control, representing the kind of quick-reference tool physicians actually reach for during a busy shift.
A pattern that keeps repeating
A February 2025 study found that chatbots outperformed physicians who were limited to internet references for clinical decision-making.
Then came a randomized controlled study published February 9, 2026, involving 1,298 participants in the UK. Standalone LLMs achieved 94.9% accuracy in identifying medical conditions. The collaborative performance, where physicians worked alongside LLMs, did not surpass the control group.
Why this matters beyond healthcare
The researchers themselves identified a gap between high benchmark performance and real-world clinical applicability. Regulatory compliance, electronic health record integration, and liability frameworks do not show up in a MedQA score.
But clinician preference is hard to dismiss. If doctors actively prefer using GPT-5.2 over a tool built specifically for them, that’s a market signal, not just a research finding.
Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

1 hour ago
22









English (US) ·