Overview
- V-JEPA2 extends last year’s video-trained world model by learning physical interactions directly from raw footage without labeled data
- Meta demonstrated the model guiding lab robots to anticipate object physics for tasks such as reaching, grasping and repositioning items
- According to Meta, V-JEPA2 delivers physical-world predictions 30 times faster than Nvidia’s Cosmos model
- The release includes three open benchmarks to help researchers evaluate AI systems’ video-based learning and reasoning capabilities
- Meta positions V-JEPA2 as a core advance toward its goal of building AI agents that can think before they act