Particle News: Berkeley Study Finds Chatbots Diverge on Morals but Often Match Reddit Consensus

Overview

Researchers tested seven large language models on more than 10,000 real AITA cases, comparing their verdicts with community outcomes.
They collected standardized labels and brief rationales, then analyzed sensitivity to six themes: fairness, feelings, harms, honesty, relational obligation, and social norms.
Models were internally consistent across repeated prompts but differed markedly from each other, indicating stable, model-specific value profiles.
ChatGPT‑4 and Claude showed greater sensitivity to feelings, many models emphasized fairness and harms over honesty, and Mistral 7B frequently chose “No assholes here” due to a more literal reading of the term.
In follow-up work now underway, preliminary results suggest models vary in conformity during multi-model deliberation, with GPT variants less likely to change blame after pushback.