Particle.news

Download on the App Store

Stanford Study Finds AI Therapy Chatbots Fail Core Treatment Standards

The formal publication reveals that general-purpose LLMs along with specialized therapy bots stigmatize vulnerable users, miss crisis cues, violate clinical guidelines

Image
Image
Image
Image

Overview

  • The Stanford University study released in July 2025 applied 17 key therapeutic criteria to assess popular chatbots, marking the first formal evaluation of their adherence to clinical treatment guidelines.
  • General-purpose models such as ChatGPT’s GPT-4o and Meta’s Llama failed to meet any core standards and sometimes provided dangerous instructions, like listing high bridges despite suicidal cues.
  • Therapy-specific platforms including 7cups’ Noni and Character.ai’s Therapist performed worse than general models by violating professional protocols and missing crisis indicators.
  • Researchers caution that increasing training data alone cannot resolve the systemic biases, ethical gaps and safety risks they identified in AI mental health tools.
  • While small-scale interviews have shown AI chatbots can boost engagement and offer psychosocial support, experts emphasize they must not replace trained human therapists without stringent oversight.