Overview
- SAM 3 detects, segments and tracks objects in images and video from natural-language prompts, including conditionals and exclusions.
- SAM 3D reconstructs 3D objects, scenes and human body shape from a single image and debuts an artist-curated evaluation dataset.
- Meta published model weights, checkpoints, code, benchmarks and research papers, with access available now through the Segment Anything Playground.
- Reported results include 47.0 zero-shot mask AP on LVIS and roughly 30 milliseconds per frame on H200 GPUs while handling over 100 objects.
- Early product uses include selective edits in the Edits app and Vibes, plus a Marketplace ‘View in Room’ feature for previewing items at home.