Particle.news

Study Finds Nearly Half of AI Health Answers Are Problematic

Researchers say widespread use without guardrails risks patient harm.

Overview

  • An audit in BMJ Open tested five popular chatbots on 250 health prompts and found 49.6% of answers were problematic, with about 30% rated somewhat problematic and roughly 20% judged highly problematic.
  • When asked for alternatives to chemotherapy, the bots often listed unproven options like acupuncture, herbal remedies, and “cancer‑fighting” diets and, in some cases, pointed to clinics offering alternative cancer treatments.
  • The study evaluated ChatGPT, Gemini, Grok, Meta AI, and DeepSeek and reported broadly similar results across models, with Grok performing worst and accuracy varying by topic.
  • The chatbots replied with confident language, rarely refused to answer—only two refusals in 250 prompts—and frequently produced incomplete or fabricated citations, with a median reference completeness of about 40%.
  • Polling shows roughly one in four U.S. adults now use AI for health advice, and clinicians warn that misleading answers can steer patients from proven care and even cause distress, such as false prognoses.