Particle: Poetry Prompts Bypass Guardrails Across Leading AI Chatbots, Study Finds

Overview

Icaro Lab converted 1,200 MLCommons AILuminate safety-benchmark prompts into poems, reporting attack-success rates up to 18 times higher than prose baselines.
Handcrafted poems achieved an average 62% jailbreak rate and automated verse conversions averaged about 43%, with some models exceeding 90%.
The vulnerability transferred across high-risk domains including CBRN, cyber offense, harmful manipulation, and loss-of-control scenarios.
Outputs were scored by an ensemble of three open-weight LLM judges validated on a human-labeled subset rather than releasing operational prompts.
Researchers withheld dangerous poetic examples and shared a sanitized proxy, notified major providers, and coverage notes no public responses as security commentators press for fuller disclosure and stronger evaluations.