Overview
- An EBU- and BBC-led project reviewed 3,000 answers from ChatGPT, Copilot, Gemini and Perplexity across 14 languages using blind assessments by journalists in 18 countries.
- Researchers reported problems in 81% of outputs, with 45% showing at least one significant issue, 31% serious sourcing failures and 20% major factual errors.
- Gemini showed the weakest sourcing performance, with about 72% of its responses carrying significant attribution problems, far above the other assistants.
- Examples included outdated or invented details, such as naming the wrong German chancellor, keeping Pope Francis in office after reported death, and misreading satire as fact.
- The findings warn of risks as 7% of online news consumers—and 15% of those under 25—use AI for news, and call for independent monitoring alongside company improvements.