Llama 4 Scout and Maverick Now Available on Amazon Bedrock and SageMaker JumpStart

2025-05-19

AWS recently announced that Meta's latest foundational models are now available. Llama 4 Scout and Llama 4 Maverick can be accessed through Amazon Bedrock and AWS SageMaker JumpStart. Both models feature multimodal capabilities and utilize a Mixture of Experts architecture.

Initially launched by Meta in April last year, Llama 4 Scout and Maverick each contain 17 billion active parameters distributed across 16 and 128 experts respectively. Llama 4 Scout is optimized to run on a single NVIDIA H100 GPU for general-purpose tasks. According to Meta, Llama 4 Maverick delivers enhanced reasoning and coding capabilities, outperforming similar models in its class. Amazon emphasizes the value of the Mixture of Experts architecture in reducing computational costs, making advanced AI more accessible and cost-effective.

Leveraging its more efficient Mixture of Experts (MoE) architecture – a Meta innovation – it activates only the most relevant parts of the model for each task, enabling customers to benefit from these powerful capabilities which are more computationally efficient in both model training and inference, delivering higher performance while reducing costs.

While Llama 4 Scout supports context windows of up to 10 million tokens, Amazon Bedrock currently allows up to 3.5 million tokens with plans to expand soon. Llama 4 Maverick supports up to 1 million tokens. In both cases, these represent a significant increase over the 128K context window available in Llama 3 models.

On Amazon SageMaker JumpStart, you can use the new models either with SageMaker Studio or the Amazon SageMaker Python SDK, depending on your use case. Both models default to using ml.p5.48xlarge instances equipped with NVIDIA H100 Tensor Core GPUs. Alternatively, you can choose ml.p5en.48xlarge instances powered by NVIDIA H200 Tensor Core GPUs. Llama 4 Scout also supports ml.g6e.48xlarge instance types featuring NVIDIA L40S Tensor Core GPUs.

The Llama 4 models are also available on multiple other cloud providers including Databricks, GroqCloud, Lambda.ai, and Cerebras Inference Cloud. Additionally, they can be accessed on Hugging Face.

Beyond Scout and Maverick, Behemoth is the third model in the Llama 4 family, comprising 288 billion active parameters distributed across 16 experts. Meta describes Behemoth as the smartest distillation teacher model currently in preview, using it to train both Scout and Maverick.