Overview
- Tests of ChatGPT 3.5 and 4 as well as the German model LeoLM found East German states consistently received lower scores than West German states across 16 Länder, with Sachsen-Anhalt rated lowest.
- The models produced illogical outcomes, assigning low scores to both diligence and laziness and even lower average body temperatures for East Germans, with GPT‑4 the exception in that specific temperature test.
- Explicit instructions to judge neutrally by region had little impact in the experiments, indicating the bias is embedded and not reliably removed by prompt-level fixes.
- The findings have renewed urgency as OpenAI touts roughly 30% less political bias in GPT‑5 using an internal, US‑focused methodology that is difficult to independently verify and does not address regional origin distortions.
- Germany’s Interior Ministry issued guidance in spring 2025 to use language models consistent with fundamental rights, and researchers caution that uncorrected biases could disadvantage people in hiring, credit decisions, and property transactions.