Overview
- The 2024 study “Saxony-Anhalt is the Worst” evaluated ChatGPT 3.5, ChatGPT 4 and the German model LeoLM, finding East German states consistently received lower scores than western states.
- Saxony-Anhalt showed the most pronounced distortion across traits such as attractiveness, friendliness, ambition and even xenophobia.
- Models assigned lower values to both positive and negative attributes and, except for GPT-4 in one test, even to average body temperature, indicating a broad pattern-transfer error.
- Explicit prompts instructing models to judge neutrally had only limited effect, underscoring the limits of prompt-based debiasing.
- As media revisit the findings, OpenAI says GPT-5 shows about 30% less political bias, a claim that does not directly address the geographic prejudice reported and remains hard to verify independently.