Overview
- The new model sustained roughly 30 hours of autonomous, multi‑step software work, up from about seven hours reported for Opus 4.
- Anthropic reports state‑of‑the‑art results on real‑world tests, scoring 77.2% on SWE‑Bench Verified (82% with parallel compute) and 61.4% on OSWorld.
- Developer rollouts include Claude Code checkpoints, a native VS Code extension, a refreshed terminal, a Claude Agent SDK, and API features for memory and automatic context management.
- Pricing remains unchanged from Sonnet 4 at $3 per million input tokens and $15 per million output tokens across the API and tools.
- Anthropic touts ASL‑3 safety measures and reduced susceptibility to prompt injection, though independent testers reported successful jailbreaks shortly after release.