Overview
- The preprint introduces Adversarial Instructional Prompts that covertly shift RAG behavior by influencing retrieval rather than user queries.
- The approach combines diverse query synthesis with a genetic algorithm to evolve prompts that appear natural, useful, and robust.
- Experiments report up to 95.23% attack success while maintaining benign functionality in normal tasks.
- The authors identify instructional prompts as widely shared and rarely audited components and urge stronger provenance and integrity checks.
- The findings are from a newly posted, unreviewed paper, and practitioner guidance stresses validation pipelines and human review to detect such risks.