Why AI health chatbots won’t make you better at diagnosing yourself – new research

Why AI health chatbots won’t make you better at diagnosing yourself – new research

Why AI health chatbots won’t make you better at diagnosing yourself – new research

https://theconversation.com/why-ai-health-chatbots-wont-make-you-better-at-diagnosing-yourself-new-research-278049

Publish Date: 2026-03-31 10:52:00

Source Domain: theconversation.com

Millions of people are turning to artificial intelligence (AI) chatbots for advice on everything from cooking to tax returns. Increasingly, they are also asking chatbots about their health.

But as the UK’s chief medical officer recently warned, that may not be wise when it comes to medical decisions. In a recent study, colleagues and I tested how well large language model (LLM) chatbots help the public deal with common health problems. The results were striking.

The chatbots we tested were not ready to act as doctors. A common response to studies like this is that AI moves faster than academic publishing. By the time a paper appears, the models tested may already have been updated. But studies using newer versions of these systems for patient triage suggest the same problems remain.

We gave participants brief descriptions of common medical situations. They were randomly assigned either to use one of three widely available chatbots or to rely on whatever sources they would normally use at home. After interacting with the chatbot, we asked two questions: what condition might explain the symptoms? And where should they seek help?

People who used chatbots were less likely to identify the correct condition than those who didn’t. They were also no better at determining the right place to seek care than the control group. In other words, interacting with a chatbot did not help people make better health decisions.

Strong knowledge, weak outcomes

This does not mean the models lack medical knowledge because LLMs can pass medical licensing exams with ease. When we removed the human element and gave the same scenarios directly to the chatbots, their performance improved dramatically. Without human involvement, the models identified relevant conditions in the vast majority of cases and often suggested appropriate levels of care.

So why did the results deteriorate when people actually used the systems? When we looked at the conversations, the problems…

Source