AMD has officially launched its first fully open-source large language model (LLM) series with 1 billion parameters—AMD OLMo. Designed for diverse application scenarios, the model has been pre-trained on AMD's Instinct MI250 GPUs. Reportedly, the AMD OLMo model boasts robust reasoning, instruction following, and conversational capabilities.
This initiative aims to enhance AMD's position in the artificial intelligence industry and offer customers the opportunity to deploy these open-source models on AMD hardware. By making data, weights, training procedures, and code publicly available, AMD encourages developers not only to replicate these models but also to further innovate upon them. Beyond data center applications, AMD has enabled the OLMo model to be locally deployed on AMD Ryzen AI personal computers equipped with Neural Processing Units (NPUs), allowing developers to leverage AI models on individual devices effectively.
Multistage Pre-training
The AMD OLMo model was trained on an extensive dataset containing 13 trillion tokens, utilizing 16 nodes, each equipped with four AMD Instinct MI250 GPUs (totaling 64 processors). The training process for this model was divided into three phases:
- The initial AMD OLMo 1B model was pre-trained on a subset of Dolma v1.7, a decoder-only transformer focused on next-token prediction to capture language patterns and general knowledge.
- The second version involves supervised fine-tuning (SFT) of the AMD OLMo 1B model, initially trained on the Tulu V2 dataset and subsequently further trained on the OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets to enhance its instruction-following capabilities and improve performance in scientific, coding, and mathematical tasks.
- After fine-tuning, the AMD OLMo 1B SFT model was adjusted for human preferences using the UltraFeedback dataset, resulting in the final AMD OLMo 1B SFT DPO version, prioritizing outputs that align with typical human feedback.
Performance Results
In AMD's own tests, the AMD OLMo model demonstrated outstanding performance in standard benchmark tests for general reasoning abilities and multitask understanding, maintaining competitiveness against similar-sized open-source models such as TinyLlama-1.1B, MobiLlama-1B, and OpenELM-1_1B.
- The two-stage SFT model exhibited a significant improvement in accuracy, with MMLU scores increasing by 5.09% and GSM8k by 15.32%, which fully demonstrates the effectiveness of AMD's training methodology.
- The final AMD OLMo 1B SFT DPO model outperformed other open-source conversational models by at least 2.60% on average across various benchmark tests.
Regarding the instruction tuning results in conversational benchmark tests, especially when comparing AMD's OLMo 1B SFT and OLMo 1B SFT DPO models with other instruction-tuned models, AMD's models outperformed the closest competitors by 3.41% in the AlpacaEval 2 Win Rate metric and by 2.29% in the AlpacaEval 2 LC Win Rate metric. Additionally, in the MT-Bench test, which measures multi-turn conversational abilities, the SFT DPO model's performance was 0.97% higher than that of the nearest competitors.
Furthermore, AMD conducted tests in responsible artificial intelligence benchmarks, including ToxiGen (lower scores indicate better performance in measuring toxic language), Crow's Pairs (assessing bias), and TruthfulQA-mc2 (evaluating the veracity of responses). The results indicate that the AMD OLMo model performs comparably to similar models in handling ethical and responsible AI tasks.
The launch of AMD OLMo signifies a significant advancement for AMD in the field of artificial intelligence, providing users and developers with more diverse and efficient AI solutions.