Particle.news

Download on the App Store

Anthropic Debuts 'Model Welfare' Feature for Claude Opus 4 and 4.1

The update gives the models the ability to terminate chats in extreme abuse cases as a precaution rooted in predeployment welfare testing.

Anthropic
Image
Image
Image

Overview

  • The conversation-ending tool is exclusive to Claude Opus 4 and 4.1, leaving the widely used Sonnet 4 unchanged.
  • Anthropic says the feature activates only after repeated redirection attempts fail, reserving shutdowns for the most severe or abusive prompts.
  • Users can explicitly request the model to end a conversation and the system will employ an end_conversation tool to close the chat.
  • A predeployment welfare assessment reported that the models exhibited consistent aversion to harm and displayed ‘distressed’ responses under persistent abusive scenarios.
  • Anthropic frames the rollout as a narrow safety experiment to explore AI welfare and moral status without affecting normal user interactions.