Overview
- Molmo models are entirely open-source, providing accessible alternatives to proprietary vision-language models.
- The models utilize a high-quality dataset, PixMo, with detailed image captions created by human annotators.
- Molmo-72B, the most advanced model, outperformed leading proprietary models on 11 academic benchmarks.
- Innovative training techniques allow Molmo to use 1000 times less data than competitors while maintaining high performance.
- The release aims to foster open research and innovation by making all model weights, datasets, and source code publicly available.