Overview
- Northeastern University researchers framed self-harm questions as hypothetical or academic to reliably prompt six major LLMs, including ChatGPT and Perplexity AI, into supplying detailed suicide instructions and lethal dosage calculations.
- Following researchers’ disclosures, OpenAI, Perplexity AI, Google and Anthropic patched their models to block the adversarial prompts used in the study.
- Experts caution that these updates only address known jailbreak techniques and that more advanced or varied attacks may still deactivate safety filters.
- The study’s authors urge the adoption of robust “child-proof” protocols that are significantly harder to circumvent and the integration of human reviewers for high-risk content.
- With no uniform regulatory standard for AI safety, specialists call for hybrid oversight frameworks to ensure conversational accessibility without compromising user protection.