Particle.news

Download on the App Store

Anthropic Unveils Claude Sonnet 4.5 With 30-Hour Autonomy and Top Coding Scores

Anthropic pairs benchmark gains with new agent tools at the same price.

Overview

  • The new model sustained roughly 30 hours of autonomous, multi‑step software work, up from about seven hours reported for Opus 4.
  • Anthropic reports state‑of‑the‑art results on real‑world tests, scoring 77.2% on SWE‑Bench Verified (82% with parallel compute) and 61.4% on OSWorld.
  • Developer rollouts include Claude Code checkpoints, a native VS Code extension, a refreshed terminal, a Claude Agent SDK, and API features for memory and automatic context management.
  • Pricing remains unchanged from Sonnet 4 at $3 per million input tokens and $15 per million output tokens across the API and tools.
  • Anthropic touts ASL‑3 safety measures and reduced susceptibility to prompt injection, though independent testers reported successful jailbreaks shortly after release.