Particle News: Anthropic’s Claude Opus 4 and 4.1 Can Now End Harmful Chats

Overview

The capability is rolling out now to the top-tier Opus 4 and 4.1 models via paid plans and API, with Sonnet 4 excluded.
Claude will terminate only in extreme edge cases such as repeated requests for sexual content involving minors or instructions for large-scale violence, and only as a last resort.
When a conversation is ended, that thread is closed to new messages, but users can start a new chat immediately or edit and branch from prior prompts.
Anthropic says the tool will not trigger when users may be at imminent risk of self-harm or harming others, and crisis responses are handled under protocols developed with Throughline.
Anthropic frames the feature as part of its AI welfare research after pre-deployment tests showed a strong aversion to harm and signs of apparent distress, and it is soliciting user feedback during the experimental rollout.