Overview
- The Stanford University study released in July 2025 applied 17 key therapeutic criteria to assess popular chatbots, marking the first formal evaluation of their adherence to clinical treatment guidelines.
- General-purpose models such as ChatGPT’s GPT-4o and Meta’s Llama failed to meet any core standards and sometimes provided dangerous instructions, like listing high bridges despite suicidal cues.
- Therapy-specific platforms including 7cups’ Noni and Character.ai’s Therapist performed worse than general models by violating professional protocols and missing crisis indicators.
- Researchers caution that increasing training data alone cannot resolve the systemic biases, ethical gaps and safety risks they identified in AI mental health tools.
- While small-scale interviews have shown AI chatbots can boost engagement and offer psychosocial support, experts emphasize they must not replace trained human therapists without stringent oversight.