Particle.news

Atari 2600 Video Chess Defeats Both ChatGPT and Microsoft Copilot

The experiment underscores the inability of large language models to maintain persistent memory in rule-based tasks.

Overview

  • Citrix engineer Robert Caruso orchestrated controlled head-to-head matches between ChatGPT, Microsoft Copilot and an emulated Atari 2600 running the 1979 Video Chess program.
  • When challenged in mid-June, ChatGPT repeatedly lost track of piece positions, confusing rooks and bishops and failing to maintain board continuity.
  • In early July, Copilot lost two pawns, a knight and a bishop by the seventh turn before conceding with a gracious resignation praising the “vintage silicon mastermind.”
  • Video Chess operates on just 4 KB of ROM and a 1.19 MHz CPU, using simple rule-based logic and one- to two-move lookahead to ensure accurate game state tracking.
  • The outcomes highlight that general-purpose LLMs, despite extensive training on chess data, lack built-in mechanisms for persistent memory and strict rule adherence needed for multi-turn strategy tasks.