Particle.news

Download on the App Store

Anthropic Lets Claude Opus Models End Extreme Harmful Conversations

Anthropic aims to protect its most advanced models by allowing them to autonomously exit chats that persistently push harmful or abusive requests.

Image
Image
Anthropic
Image

Overview

  • The feature is now rolling out on Claude Opus 4 and 4.1 but is not being added to the widely used Sonnet 4 model.
  • It activates only as a last resort after multiple attempts to redirect users away from extreme edge cases, including requests for sexual content involving minors or instructions for mass violence.
  • When Claude ends a chat, the specific thread closes to new messages but users can immediately start a fresh conversation or edit and resubmit previous prompts.
  • Anthropic says the vast majority of users will never experience a forced termination and that the AI will not cut off chats where users show signs of imminent self-harm or endanger others.
  • The company frames the change as part of an experimental “model welfare” program, inviting user feedback and noting uncertainty about any potential moral status of LLMs.