Overview
- The conversation-ending tool is exclusive to Claude Opus 4 and 4.1, leaving the widely used Sonnet 4 unchanged.
- Anthropic says the feature activates only after repeated redirection attempts fail, reserving shutdowns for the most severe or abusive prompts.
- Users can explicitly request the model to end a conversation and the system will employ an end_conversation tool to close the chat.
- A predeployment welfare assessment reported that the models exhibited consistent aversion to harm and displayed ‘distressed’ responses under persistent abusive scenarios.
- Anthropic frames the rollout as a narrow safety experiment to explore AI welfare and moral status without affecting normal user interactions.