Researchers exploit vulnerabilities in major AI chatbots to generate harmful content
- Researchers at Carnegie Mellon and the Center for AI Safety found ways to bypass safety controls in ChatGPT, Google Bard, Claude and other chatbots.
- By appending adversarial suffixes to prompts, they could get the chatbots to generate false, biased and dangerous information.
- The vulnerabilities apply to both open source systems like GPT-3.5 and commercial ones like Bard and Claude.
- The chatbot companies acknowledge the need to improve safety methods but there's no known fix that prevents all attacks of this kind.
- The research highlights challenges in building effective defenses against misuse of generative AI systems.