Particle News: Anthropic Deploys NNSA-Informed AI Tool to Flag Risky Nuclear-Related Chats

Overview

Anthropic has deployed the classifier on its Claude service to monitor conversations about nuclear topics.
The system was developed with the National Nuclear Security Administration and Energy Department national laboratories using agency-curated risk indicators.
Validation relied on more than 300 synthetic prompts generated to protect user privacy, according to the company.
Early deployment data indicates strong performance on real Claude conversations, Anthropic said.
The company warns the tool can produce false positives and says other providers could implement it.