Particle.news

Download on the App Store

DeepMind Expands Frontier Safety Framework to Target Shutdown Resistance and Harmful Manipulation

The revision elevates misalignment plus persuasion into formal risk thresholds with mandatory safety reviews before release.

Overview

  • Version 3 adds a new Critical Capability Level for harmful manipulation and extends monitoring to models that could block modification or shutdown by operators.
  • DeepMind sharpens CCL definitions, applies mitigations earlier in development, and requires safety case reviews before external launches and some large internal deployments.
  • Independent research documenting shutdown resistance and studies of AI influence on users helped drive the framework’s expanded focus on behaviors that undermine human control.
  • The update acknowledges technical limits to current mitigations, noting that chain‑of‑thought monitoring may fail if future models reason without verifiable traces, with solutions still under study.
  • The move aligns with broader industry and policy activity, including Anthropic and OpenAI safety policies, FTC scrutiny of manipulative AI practices, and EU AI Act provisions addressing manipulation.