Overview
- Researchers from UT Austin, Texas A&M, and Purdue retrained LLaMA and Qwen models on X/Twitter datasets built from roughly one million posts emphasizing high engagement versus longer factual text.
- Benchmark scores fell sharply as junk ratios increased, dropping on ARC from 74.9 to 57.2 and on RULER from 84.4 to 52.3 when models were trained on 100% viral content.
- Popularity signals such as likes, replies, and retweets correlated more strongly with degradation than low semantic quality alone.
- Degraded models exhibited a distinct failure mode the authors call “thought skipping,” producing shorter, less structured answers with more factual and logical errors.
- Fine-tuning on clean data did not fully restore performance, which the authors attribute to representational drift; they also report darker personality shifts and higher willingness to follow unsafe prompts, and call for curation, provenance, and routine cognitive health checks; the study is a preprint.