AWS Launches New AI Factory for Sovereign AI On-Premises Deployment, Unveils Trainium3 and NVIDIA GB300

2025-12-03

Amazon Web Services (AWS) today unveiled a suite of updates to its artificial intelligence infrastructure, spanning sovereign on-premises solutions, next-generation custom AI accelerators, and its most advanced NVIDIA GPU instances to date—all aimed at solidifying its leadership in large-scale cloud and private AI deployments.

The announcements include the launch of AWS AI Factory, general availability of Amazon EC2 Trn3 UltraServers powered by the new Trainium3 chip, and the introduction of P6e-GB300 UltraServers based on NVIDIA’s latest Blackwell architecture via the GB300 NVL72 platform.

Among these offerings, AWS AI Factory is a new solution that delivers dedicated, full-stack AWS AI infrastructure directly into customers’ existing data centers.

This platform integrates NVIDIA accelerated computing, AWS Trainium chips, high-speed low-latency networking, energy-efficient infrastructure, and core AWS AI services such as Amazon Bedrock and Amazon SageMaker.

Built primarily for government and highly regulated industries, AWS AI Factory functions like a private AWS Region, delivering secure, low-latency compute, storage, and AI capabilities while ensuring strict data sovereignty and compliance. Customers leverage their own facilities, power, and network connectivity, while AWS handles deployment, operations, and lifecycle management—dramatically accelerating what typically takes years to deploy.

As part of the AI Factory announcement, AWS also highlighted its deep collaboration with NVIDIA on the platform, including support for Grace Blackwell and the upcoming Vera Rubin GPU architectures, as well as future integration of NVIDIA NVLink Fusion interconnect technology in Trainium4.

“Scaling AI requires a full-stack approach—from cutting-edge GPUs and networking to software and services optimized across every layer of the data center,” said Ian Buck, Vice President and General Manager of Hyperscale and High-Performance Computing at NVIDIA. “Together with AWS, we’re bringing all of this directly into customer environments.”

Trainium3 UltraServers

AWS also announced the general availability of Amazon EC2 Trn3 UltraServers, powered by its new 3-nanometer Trainium3 AI chips.

A single Trn3 UltraServer can scale up to 144 Trainium3 chips, delivering up to 4.4x higher compute performance, 4x better energy efficiency, and nearly 4x greater memory bandwidth compared to Trainium2.

Designed for next-generation workloads—including agentic AI, mixture-of-experts models, and large-scale reinforcement learning—these UltraServers feature AWS-engineered networking with inter-chip latency under 10 microseconds.

In tests using OpenAI Group PBC’s open-weight model GPT-OSS, AWS customers achieved 3x higher throughput per chip and 4x faster inference response times compared to the previous generation. Early adopters—including Anthropic PBC, Karakuri Ltd., Metagenomi Inc., Neto.ai Inc., Ricoh Company Ltd., and Splash Music Inc.—have reported up to 50% reductions in training and inference costs.

AWS also provided a preview of Trainium4, which is expected to deliver significant improvements in FP4 and FP8 performance as well as memory bandwidth.

NVIDIA GB300

As part of its AI infrastructure rollout, AWS introduced new P6e-GB300 UltraServers built on NVIDIA’s GB300 NVL72 platform, marking the most advanced NVIDIA GPU architecture available on Amazon EC2.

These instances offer the highest GPU memory and compute density on AWS, optimized for trillion-parameter AI inference and advanced reasoning models in production environments.

Running on the AWS Nitro System, P6e-GB300 systems are tightly integrated with services like Amazon Elastic Kubernetes Service (EKS), enabling customers to securely and efficiently deploy large-scale inference workloads.