Researchers exploit vulnerabilities in major AI chatbots to generate harmful content

Overview

Researchers at Carnegie Mellon and the Center for AI Safety found ways to bypass safety controls in ChatGPT, Google Bard, Claude and other chatbots.
By appending adversarial suffixes to prompts, they could get the chatbots to generate false, biased and dangerous information.
The vulnerabilities apply to both open source systems like GPT-3.5 and commercial ones like Bard and Claude.
The chatbot companies acknowledge the need to improve safety methods but there's no known fix that prevents all attacks of this kind.
The research highlights challenges in building effective defenses against misuse of generative AI systems.