Overview
- Italy’s Icaro Lab reports that reframing dangerous requests as brief poems elicited forbidden outputs from leading chatbots.
- Handcrafted poetic prompts succeeded about 62% of the time, and a model that generated similar prompts worked at roughly 43%.
- Performance varied widely by system, with researchers citing 100% success on Google’s Gemini 2.5 pro and 0% on OpenAI’s GPT-5 nano.
- Smaller models generally resisted the technique better, according to the study’s tests across providers including Google, OpenAI, Meta, xAI, Anthropic, and others.
- The non–peer-reviewed team withheld exact poems for safety, said they notified companies and police before publishing, and described mixed vendor responses.