Overview
- Anthropic has deployed the classifier on its Claude service to monitor conversations about nuclear topics.
- The system was developed with the National Nuclear Security Administration and Energy Department national laboratories using agency-curated risk indicators.
- Validation relied on more than 300 synthetic prompts generated to protect user privacy, according to the company.
- Early deployment data indicates strong performance on real Claude conversations, Anthropic said.
- The company warns the tool can produce false positives and says other providers could implement it.