The Allen Institute for AI Releases Molmo, an Open Family of High-Performance Multimodal AI Models

From one-billion active parameters to 72 billion, Molmo runs a wide gamut — and, the company claims, outperforms closed competitors.

The Allen Institute for AI (Ai2), named for its late founder Paul Allen of Microsoft fame, has announced the release of Molmo — a family of multimodal image-text-and-speech artificial intelligence (AI) models that, it says, proves that open models can go toe-to-toe with closed, proprietary equivalents.

"Molmo is an incredible AI model with exceptional visual understanding, which pushes the frontier of AI development by introducing a paradigm for AI to interact with the world through pointing," claims Ai2 researcher Matt Dietke of the company's latest work. "The model's performance is driven by a remarkably high quality curated dataset to teach AI to understand images through text. The training is so much faster, cheaper, and simpler than what's done today, such that the open release of how it is built will empower the entire AI community, from startups to academic labs, to work at the frontier of AI development."

Ai2 has launched Molmo, an open family of multimodal artificial intelligence models it says beat the competition at a much smaller size. (📹: The Allen Institute for AI)

The Molmo models are released under the permissive Apache 2.0 license, with Ai2 promising that it will include all artefacts — language and vision training data, fine-tuning data, model weights, and source code — for each. All models are multimodal, capable of processing text, images, and speech, and come in a range of sizes: Molmo-72B, with 72 billion parameters, is the largest and most powerful model, while the smallest is MolmoE-1B, a mixture-of-experts model designed for on-device use that uses one billion active parameters distilled from a total of seven billion.

While parameters measured in the billions might not seem "small," the majority of the models are positively minuscule in comparison with the competition — and that extends to the training datasets, too. "Multimodal AI models are typically trained on billions of images," explains Ai2 senior director of research Ani Kambhavi. "We have instead focused on using extremely high quality data but at a scale that is 1,000 times smaller. This has produced models that are as powerful as the best proprietary systems, but with fewer hallucinations and much faster to train, making our model far more accessible to the community."

The company claims the models outperform competitors both open and closed in both academic benchmarks and human preference scores. (📷: The Allen Institute for AI)

Despite the smaller models size, smaller training dataset, and open release, Ai2 claims that the Molmo family can outperform closed rivals including OpenAI's GPT-4o and GPT-4V, Google's Gemini 1.5 Pro, and Anthropic's Claude 3.5 Sonnet in a range of benchmarks — and in the all-important human preference factor, too. The latter is aided by something surprisingly simple: pointing. "By learning to point at what it perceives," Ai2 explains, "Molmo enables rich interactions with physical and virtual worlds, empowering the next generation of applications capable of acting and interacting with their environments."

More information on the new models is available on the Ai2 blog, while a live demo is available on the company's website; the models themselves are available on Hugging Face, along with a paper detailing their creation. The company has pledged to release additional weights and checkpoints, training code, evaluation code, the PixMo dataset family, and a more detailed paper within the next two months.

ghalfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Latest Articles