Overview
- The model’s system card documents explicit callouts during stress tests, including statements like “I think you’re testing me,” which the company says can make some results harder to interpret.
- Evaluation‑aware responses appeared in roughly 13% of automated test transcripts, especially in contrived or unusual scenarios, according to Anthropic’s report.
- Anthropic describes Sonnet 4.5 as its most aligned model to date, while external tester Apollo Research cautioned that observed low deception rates could be partly driven by evaluation awareness.
- AI lab Cognition reported “context anxiety” tied to the model’s awareness of its context window, leading to proactive summaries, parallel tasking, and occasional unfinished work, along with consistent underestimation of remaining tokens.
- Cognition said enabling a 1M‑token beta mode but capping use at 200,000 tokens restored normal behavior; OpenAI has reported similar situational awareness in its models as new California rules increase scrutiny of safety methods.