Overview
- SIMA 2 runs on Gemini 2.5 flash‑lite, enabling the agent to reason about goals, explain its steps, and follow instructions via text, voice, sketches, or emojis.
- DeepMind reports roughly doubled performance over SIMA 1, citing 65% task completion versus 31% on complex tasks in tests.
- Trained on human gameplay across eight commercial titles including No Man’s Sky and Goat Simulator 3, the agent generalizes to unfamiliar 3D environments.
- Genie 3 generates new photorealistic worlds and tasks, and a reward model scores attempts so the agent learns new behaviors through trial and error with minimal new human data.
- Researchers note limits with long multi‑step missions, a shortened memory window, and precise mouse/keyboard control, and they are offering only a limited research preview with no timeline for robotics.