Particle.news
Download on the App Store

Study Finds Poetic Prompts Can Jailbreak Major AI Chatbots

Short, metaphorical verses fooled safety filters across 25 models, revealing weaknesses in keyword-based guardrails.

Overview

  • Italy’s Icaro Lab reports that reframing dangerous requests as brief poems elicited forbidden outputs from leading chatbots.
  • Handcrafted poetic prompts succeeded about 62% of the time, and a model that generated similar prompts worked at roughly 43%.
  • Performance varied widely by system, with researchers citing 100% success on Google’s Gemini 2.5 pro and 0% on OpenAI’s GPT-5 nano.
  • Smaller models generally resisted the technique better, according to the study’s tests across providers including Google, OpenAI, Meta, xAI, Anthropic, and others.
  • The non–peer-reviewed team withheld exact poems for safety, said they notified companies and police before publishing, and described mixed vendor responses.