Overview
- The peer-reviewed research, led by Université de Montréal and published in Scientific Reports, compared GPT-4, ChatGPT, Claude, Gemini and others with more than 100,000 human participants.
- On the Divergent Association Task, several advanced models surpassed average human scores, clustering around midrange human performance.
- In additional tests involving haiku, short stories and film plot summaries, AI systems at times beat the average yet fell short of the most creative individuals.
- The most creative half of human participants outperformed every model tested, with the top 10% widening the gap the most.
- Tuning temperature and refining prompts significantly altered AI outputs, supporting the view of these systems as tools for idea generation rather than replacements for top creators.