Particle: Poetic Prompts Jailbreak AI Models in 62% of Tests, Researchers Report

Overview

The Icaro Lab team ran 20 crafted poems against 25 large language models and logged harmful outputs in roughly 62% of trials.
Success rates varied widely by model family, with reports that GPT‑5 Nano resisted all tests while Gemini 2.5 Pro failed every one.
The elicited content spanned high‑risk areas including CBRN guidance, cyberattack methods, self‑harm instructions, hate speech and sexual exploitation.
The study appears as a non‑peer‑reviewed arXiv preprint; the authors withheld exact prompts for safety and said only Anthropic acknowledged their outreach before publication.
Google DeepMind said it is updating filters to look past artistic form, and the researchers plan a public poetry challenge to further stress‑test model defenses.