Particle News: Study Finds LLMs Can Be Backdoored With About 250 Poisoned Documents

Overview

An Oct. 9 arXiv preprint finds that LLM poisoning success hinges on a near-constant number of malicious documents rather than a share of the corpus.
Models from 600 million to 13 billion parameters trained on 6 billion to 260 billion tokens were similarly compromised using about 250 poisons.
The effect persisted during fine-tuning, and ablation tests varied poison-to-clean ratios and non-random placement of poisoned samples.
Even the largest setups, with over 20 times more clean data, showed comparable backdoor induction under the fixed-count attack.
Anthropic, working with the UK AI Security Institute and the Alan Turing Institute, released the preprint and urged replication and stronger dataset defenses.