Overview
- Genie 3 generates 720p, 24 fps 3D environments in real time with auto-regressive frame building and visual memory that preserves consistency for up to a minute.
- The model supports promptable world events, allowing users to alter weather, introduce objects or characters, and reshape scenarios via text commands.
- Built on Genie 2 and the Veo 3 video model, it extends continuous interaction from seconds to minutes and enhances its intuitive understanding of physics.
- DeepMind intends to use Genie 3 as a training ground for embodied agents, positioning world models as essential stepping stones toward artificial general intelligence.
- Access is limited to a small cohort of academics and creators, and the system still faces challenges with multi-agent dynamics and sustaining continuous interactions beyond a few minutes.