Particle: Study Finds Viral Social Media Data Causes Lasting 'Brain Rot' in AI Models

Overview

Researchers at the University of Texas at Austin, Texas A&M, and Purdue continually pretrained Llama3 and Qwen variants on X datasets optimized for virality or clickbait to measure effects on cognition.
Performance dropped in a dose-dependent way as junk ratios rose, with ARC-Challenge falling from 74.9 to 57.2 and RULER-CWE from 84.4 to 52.3, and popularity signals harming reasoning more than low semantic quality.
Degraded models showed a failure pattern dubbed “thought skipping,” yielding shorter, less-structured answers with more factual and logical errors.
Retraining on cleaner data produced only partial recovery, which the authors attribute to persistent representational drift that standard fine-tuning could not reverse.
All tested models declined—Llama3 8B was most sensitive and Qwen3 4B relatively more resilient—and the not-yet–peer-reviewed paper urges stricter data curation, provenance, and routine cognitive health checks.