Particle.news

Download on the App Store

Apple Study Finds Reasoning Models Collapse on Complex Puzzles

Findings fuel debate over whether the drop-off reflects AI’s fundamental reasoning limits or test design shortcomings.

Image
Image
Image

Overview

  • Apple’s preprint “The Illusion of Thinking” tested leading large reasoning models across four controlled puzzles: Tower of Hanoi, Checkers Jumping, River Crossing and Blocks World.
  • Models performed well at low complexity but saw an accuracy collapse at high difficulty and cut reasoning effort as tasks grew harder.
  • Thinking-mode variants outperformed non-thinking models only at medium complexity before matching performance at both low and high difficulty levels.
  • Critics argue that token limits and puzzle design, rather than inherent flaws, explain the models’ failures and call for more realistic evaluation scenarios.
  • Experts remain divided on whether the results expose fundamental AI reasoning barriers or highlight the need for refined testing methods ahead of AGI development.