Particle.news

Download on the App Store

Berkeley Study Finds Chatbots Diverge on Morals but Often Match Reddit Consensus

A pre-print from UC Berkeley shows large language models encode distinct moral priorities surfaced through thousands of Reddit dilemmas.

Overview

  • Researchers tested seven large language models on more than 10,000 real AITA cases, comparing their verdicts with community outcomes.
  • They collected standardized labels and brief rationales, then analyzed sensitivity to six themes: fairness, feelings, harms, honesty, relational obligation, and social norms.
  • Models were internally consistent across repeated prompts but differed markedly from each other, indicating stable, model-specific value profiles.
  • ChatGPT‑4 and Claude showed greater sensitivity to feelings, many models emphasized fairness and harms over honesty, and Mistral 7B frequently chose “No assholes here” due to a more literal reading of the term.
  • In follow-up work now underway, preliminary results suggest models vary in conformity during multi-model deliberation, with GPT variants less likely to change blame after pushback.