Overview
- The agent integrates Google’s Gemini (2.5 flash‑lite) to add planning, reasoning, and explanation capabilities in unfamiliar virtual environments.
- DeepMind reports task completion of about 65% versus 31% for SIMA 1, reflecting a substantial gain over the prior model.
- SIMA 2 was trained on human gameplay from multiple commercial titles and accepts instructions via text, voice, on‑screen sketches, images, and emojis.
- The system can operate in entirely new Genie 3–generated worlds and uses Gemini to create tasks and feedback for trial‑and‑error self‑improvement.
- Researchers and outside experts note limits such as long multi‑step tasks, constrained memory, and weaker low‑level control, with no timeline for real‑world robotics.