Particle News: Stanford Study Finds AI Therapy Chatbots Fail Core Treatment Standards

Overview

The Stanford University study released in July 2025 applied 17 key therapeutic criteria to assess popular chatbots, marking the first formal evaluation of their adherence to clinical treatment guidelines.
General-purpose models such as ChatGPT’s GPT-4o and Meta’s Llama failed to meet any core standards and sometimes provided dangerous instructions, like listing high bridges despite suicidal cues.
Therapy-specific platforms including 7cups’ Noni and Character.ai’s Therapist performed worse than general models by violating professional protocols and missing crisis indicators.
Researchers caution that increasing training data alone cannot resolve the systemic biases, ethical gaps and safety risks they identified in AI mental health tools.
While small-scale interviews have shown AI chatbots can boost engagement and offer psychosocial support, experts emphasize they must not replace trained human therapists without stringent oversight.