Particle News: Anthropic Open-Sources Political Evenhandedness Test and Tunes Claude for Balanced Responses

Overview

Anthropic released a GitHub tool and methodology to gauge chatbot evenhandedness using paired left- and right-leaning prompts in single-turn U.S. political queries, factoring in how often models refuse to answer.
Company tests show Claude Sonnet 4.5 at 95% and Opus 4.1 at 94% evenhandedness, outperforming GPT-5 and Llama 4 but slightly trailing Gemini 2.5 Pro at 97% and Grok 4 at 96%.
Claude was updated with new system instructions and reinforcement learning to avoid unsolicited political opinions, use neutral terminology, represent multiple perspectives, and convincingly articulate opposing views.
Anthropic emphasizes that there is no consensus definition of political bias and describes its evaluation as a helpful but not foolproof indicator, encouraging others to use and extend the tool.
The push comes after President Trump’s order directing agencies to procure 'unbiased' AI and as Anthropic faces criticism over perceived leanings, with procurement guidance from the Office of Management and Budget due next week.