Overview
- An August audit found leading chatbots repeated provably false news claims 35% of the time, up from 18% in 2024, with model-level scores published for the first time.
- Inflection’s Pi performed worst at 57% false responses, Perplexity was about 47%, ChatGPT and Meta were about 40%, while Anthropic’s Claude was about 10% and Google’s Gemini about 17%.
- Refusal to answer current-events queries fell to 0% from 31% last year, increasing the volume of confident but inaccurate responses.
- NewsGuard attributes the decline to models pulling from live web results vulnerable to narrative seeding by networks such as Russia-linked Pravda and Storm-1516.
- Testing used 10 known false claims with neutral, leading, and malicious prompts, and documented cases where chatbots echoed fabricated reports, contradicting recent vendor assurances about improved accuracy.