As AI shows off diagnostic chops, scientists reckon with the way forward
As AI shows off diagnostic chops, scientists reckon with the way forward
https://www.bostonglobe.com/2026/05/03/business/ai-diagnosis-emergency-department-patients-hospital/
Publish Date: 2026-05-03 12:01:00
Source Domain: www.bostonglobe.com
But as generative AI tools like chatbots are heavily marketed — both to patients and clinicians — it makes him worried that the science experiments, all based on simulated and historical cases, will be misconstrued as proof of AI’s safety and efficacy when used to treat real patients.
“I worry that my research agenda — which is not to replace doctors, not to get people to stop seeing their doctors and talk to their chatbots; my research agenda is to use a new type of technology to improve medical care — is going to get used by companies that are heavily financed and are looking to skip some of these essential safe pieces of medicine,” said Rodman, an assistant professor of medicine at Beth Israel Deaconess Medical Center and visiting researcher at Google. “You can understand why I’m reticent.” That concern is mirrored by other clinical AI researchers who view the experiments’ results with more skepticism.
Since consumer-facing LLMs burst onto the scene in 2022, researchers have been chucking a variety of diagnostic tests their way. Multiple-choice medical licensing exams. Tricky case studies published in the New England Journal of Medicine. Models appeared to do well on most of them, but often without comparing their performance to large groups of physicians, said co-senior author Arjun Manrai. Their experiments, detailed in the Science paper, aimed to fill that gap by testing OpenAI’s 2024 o1-preview, one of a new generation of so-called “reasoning” models.
“We were really trying to throw everything that we could at the model,” said Manrai, an assistant professor of biomedical informatics at Harvard Medical School. They replicated a number of tests, some previously conducted on GPT-4, that measure an aspect of clinical reasoning against a baseline of performance from dozens of physicians.
Most of those experiments spoon-fed structured, curated case studies to the LLM. But in a novel approach, the group — which includes some of…