Particle.news

Download on the App Store

Apple Study Reveals AI Reasoning Models Collapse on Complex Problems

Researchers found that models cut their reasoning effort at critical complexity points despite ample computational resources.

Image
Image
Image
Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity

Overview

  • Apple tested large reasoning models and standard language models on controlled puzzles including Tower of Hanoi, River Crossing, Checker Jumping and Blocks World with adjustable difficulty.
  • Accuracy for reasoning models declined steadily as puzzle complexity increased, eventually dropping to zero beyond a model-specific threshold.
  • Models paradoxically reduced their chain-of-thought token usage when approaching their collapse point instead of leveraging available compute.
  • Even when provided with exact step-by-step solution algorithms, reasoning models failed to execute instructions reliably on high-complexity tasks.
  • Experts such as Gary Marcus warn that these findings expose fundamental barriers to generalizable reasoning and could stall progress toward artificial general intelligence.