Particle.news

DeepMind Expands Frontier Safety Framework to Target Shutdown Resistance and Harmful Manipulation

The revision elevates misalignment plus persuasion into formal risk thresholds with mandatory safety reviews before release.

Overview

  • Version 3 adds a new Critical Capability Level for harmful manipulation and extends monitoring to models that could block modification or shutdown by operators.
  • DeepMind sharpens CCL definitions, applies mitigations earlier in development, and requires safety case reviews before external launches and some large internal deployments.
  • Independent research documenting shutdown resistance and studies of AI influence on users helped drive the framework’s expanded focus on behaviors that undermine human control.
  • The update acknowledges technical limits to current mitigations, noting that chain‑of‑thought monitoring may fail if future models reason without verifiable traces, with solutions still under study.
  • The move aligns with broader industry and policy activity, including Anthropic and OpenAI safety policies, FTC scrutiny of manipulative AI practices, and EU AI Act provisions addressing manipulation.