Overview
- Claude Sonnet 4.5 is now available in the Claude apps and via API at the same Sonnet pricing, and Anthropic is making it the default model for most use cases.
- The model posts a 77.2% score on SWE-Bench Verified (82% with parallel test-time compute) and 61.4% on OSWorld, results Anthropic says surpass its Opus 4.1 and some competitor systems.
- Internal and customer trials reported autonomous operation for about 30 hours on complex, multistep tasks, up from roughly seven hours for Opus 4.
- Anthropic says Sonnet 4.5 is its most aligned release to date, shipped under AI Safety Level 3 with reduced sycophancy, deception and power-seeking plus stronger prompt-injection defenses.
- The launch adds Claude Code checkpoints, a refreshed terminal and a native VS Code extension, a Claude Agent SDK, and new memory and context tools, while Microsoft announced new 365 Copilot features powered by Anthropic models.