Overview
- Apple researchers pitted Large Reasoning Models and their non-reasoning counterparts against a suite of mathematical puzzles with varying complexity.
- Non-reasoning models matched or exceeded reasoning models on simple tasks, with reasoning models only showing slight advantages on medium-difficulty puzzles.
- All models suffered a dramatic accuracy collapse on high-complexity puzzles regardless of available computational power.
- Analysis revealed LRMs use no explicit algorithms and produce inconsistent reasoning chains, calling their logical thinking capabilities into question.
- The study also noted that reasoning models consumed more energy and had longer response times, highlighting efficiency concerns alongside their performance limits.