Particle News: Anthropic Debuts 'Model Welfare' Feature for Claude Opus 4 and 4.1

Overview

The conversation-ending tool is exclusive to Claude Opus 4 and 4.1, leaving the widely used Sonnet 4 unchanged.
Anthropic says the feature activates only after repeated redirection attempts fail, reserving shutdowns for the most severe or abusive prompts.
Users can explicitly request the model to end a conversation and the system will employ an end_conversation tool to close the chat.
A predeployment welfare assessment reported that the models exhibited consistent aversion to harm and displayed ‘distressed’ responses under persistent abusive scenarios.
Anthropic frames the rollout as a narrow safety experiment to explore AI welfare and moral status without affecting normal user interactions.