Allen Institute for AI Launches Molmo: Open-Source Multimodal Models Rivaling Tech Giants

Molmo models outperform proprietary systems like GPT-4o and Claude 3.5 Sonnet on several benchmarks using significantly less data.

Overview

Molmo models are entirely open-source, providing accessible alternatives to proprietary vision-language models.
The models utilize a high-quality dataset, PixMo, with detailed image captions created by human annotators.
Molmo-72B, the most advanced model, outperformed leading proprietary models on 11 academic benchmarks.
Innovative training techniques allow Molmo to use 1000 times less data than competitors while maintaining high performance.
The release aims to foster open research and innovation by making all model weights, datasets, and source code publicly available.