Particle.news

Apple Study Reveals AI Reasoning Models Collapse on Complex Problems

Researchers found that models cut their reasoning effort at critical complexity points despite ample computational resources.

Overview

  • Apple tested large reasoning models and standard language models on controlled puzzles including Tower of Hanoi, River Crossing, Checker Jumping and Blocks World with adjustable difficulty.
  • Accuracy for reasoning models declined steadily as puzzle complexity increased, eventually dropping to zero beyond a model-specific threshold.
  • Models paradoxically reduced their chain-of-thought token usage when approaching their collapse point instead of leveraging available compute.
  • Even when provided with exact step-by-step solution algorithms, reasoning models failed to execute instructions reliably on high-complexity tasks.
  • Experts such as Gary Marcus warn that these findings expose fundamental barriers to generalizable reasoning and could stall progress toward artificial general intelligence.