Overview
- Version 3 adds a new Critical Capability Level for harmful manipulation and extends monitoring to models that could block modification or shutdown by operators.
- DeepMind sharpens CCL definitions, applies mitigations earlier in development, and requires safety case reviews before external launches and some large internal deployments.
- Independent research documenting shutdown resistance and studies of AI influence on users helped drive the framework’s expanded focus on behaviors that undermine human control.
- The update acknowledges technical limits to current mitigations, noting that chain‑of‑thought monitoring may fail if future models reason without verifiable traces, with solutions still under study.
- The move aligns with broader industry and policy activity, including Anthropic and OpenAI safety policies, FTC scrutiny of manipulative AI practices, and EU AI Act provisions addressing manipulation.