The RegisterOpenAIWednesday, April 15, 2026·1 min read

Don't let the bot play doctor! AI gets early diagnoses wrong 80% of the time

AI Article Analysis

Researchers have found that large language models (LLMs) such as ChatGPT perform poorly at medical diagnosis, getting early-stage diagnoses wrong approximately 80% of the time. The study highlights significant limitations in AI systems' ability to engage in diagnostic reasoning for patient-facing medical applications, raising serious concerns about their reliability in healthcare contexts. This finding suggests that while LLMs can provide general information, they should not be relied upon for actual medical diagnosis or clinical decision-making.

The research underscores a critical gap between AI's conversational capabilities and its competence in specialized domains like medicine. LLMs lack the nuanced understanding of complex medical presentations, the ability to conduct proper differential diagnosis, and access to comprehensive patient histories that physicians develop through training and experience. These limitations could pose real risks to patients who might delay seeking professional medical care or make treatment decisions based on inaccurate AI assessments.

The implications extend beyond individual users to healthcare policy and regulation. As AI tools become increasingly integrated into digital health platforms, this research provides empirical evidence supporting the need for strict guidelines preventing AI systems from assuming diagnostic roles. Experts advise that LLMs should be restricted to supplementary functions such as providing general health information or assisting healthcare professionals in administrative tasks, rather than making clinical judgments that directly affect patient outcomes.

Key Takeaways

Researchers have found that large language models (LLMs) such as ChatGPT perform poorly at medical diagnosis, getting early-stage diagnoses wrong approximately 80% of the time.
The study highlights significant limitations in AI systems' ability to engage in diagnostic reasoning for patient-facing medical applications, raising serious concerns about their reliability in healthcare contexts.
This finding suggests that while LLMs can provide general information, they should not be relied upon for actual medical diagnosis or clinical decision-making.
The research underscores a critical gap between AI's conversational capabilities and its competence in specialized domains like medicine.

Read the full article on The Register

Read on The Register