AI2 Launches OLMo 2, a Fully Open-Source Language Model

2024-11-28

Recently, AI research organization Ai2 unveiled the latest addition to its open-source language model lineup—OLMo 2. Unlike popular open-source models such as LLama and Gemma, OLMo (Open Language Model) not only offers model weights but also includes development tools, datasets, training guidelines, and a comprehensive suite of resources, achieving full-spectrum open-source capabilities.

OLMo 2 series encompasses versions with 7 billion parameters (both base and instruction variants) and 13 billion parameters (base and instruction variants), all demonstrating robust performance. Notably, the 7-billion-parameter model outperforms Meta's LLama 3.1 with 8 billion parameters on English academic benchmarks, while the 13-billion-parameter version surpasses Qwen 2.5's 7 billion-parameter model while utilizing fewer computational resources during training.

The release builds upon the first OLMo model launched earlier this year. The Ai2 team employed an innovative two-phase training strategy: initial training on a large-scale dataset containing 3.9 trillion tokens, followed by fine-tuning using high-quality academic materials, mathematical exercise books, and instruction datasets.

To ensure training stability, the team implemented critical adjustments to prevent performance degradation during extended training periods. These modifications involve enhancements to the model architecture and training workflow.

Additionally, OLMo 2's release leverages Ai2's recently developed open-source training system, Tülu 3. Acting as an advanced post-training processing step, Tülu 3 enables OLMo 2 to execute instruction-following tasks on par with world-leading models.

The complete release includes evaluation frameworks and intermediate checkpoints, providing researchers with tools to thoroughly assess OLMo 2's performance and facilitate further improvements.

Researchers and developers can access OLMo 2 through Ai2's online platform or download it from the Hugging Face platform. The model is distributed under the Apache 2.0 license, permitting unrestricted learning, modification, or development based on its foundation.