AWS Unveils Project Rainer to Deploy Hundreds of Thousands of Trainium2 Chips AI NEWS

Home
AInews
AWS Unveils Project Rainer to Deploy Hundreds of Thousands of Trainium2 Chips

AWS Unveils Project Rainer to Deploy Hundreds of Thousands of Trainium2 Chips

2024-12-04

Amazon Web Services (AWS) recently unveiled Project Rainer, a massive compute cluster powered by hundreds of thousands of custom AWS Trainium2 chips.

AWS leverages this system to support the initiatives of AI development firm Anthropic PBC. Since September last year, Amazon, the parent company of AWS, has invested $8 billion in this OpenAI competitor. A few weeks ago, Anthropic announced its collaboration with AWS to enhance the Trainium chip series.

The Trainium2 chip features eight so-called NeuronCores, each comprising four computing modules. One of these modules is the GPSIMD engine, optimized specifically for executing custom AI operations. These operations are highly specialized, low-level code snippets that machine learning teams utilize to enhance the performance of neural networks.

The eight NeuronCores are supported by 96GB of high-bandwidth memory (HBM), which operates significantly faster than other types of RAM. The Trainium2 chip can transfer data between the HBM pool and NeuronCores at speeds of up to 2.8 terabits per second. The faster data reaches the chip's processing units, the sooner computations can begin.

Hundreds of thousands of Trainium2 chips in Project Rainer are assembled into what are called Trn2 UltraServers. These AWS-developed servers were announced alongside the compute cluster today. Each machine houses 64 Trainium2 chips and delivers an aggregate performance of 332 teraflops when running sparse FP8 operations, a type of computation used by AI models to process data.

Unlike the typical approach of deploying Project Rainer's servers in a single data center, AWS has opted to distribute them across multiple locations. This strategy simplifies logistical tasks, such as ensuring the necessary power supply for the cluster.

Historically, distributing hardware across multiple facilities offers clear benefits but also incurs costs, notably increased latency. The greater the distance between servers in the cluster, the longer it takes for data to transmit between them. Since AI clusters frequently exchange information between servers, this added latency can significantly slow down processing speeds.

AWS has addressed this limitation with an internally developed technology called Elastic Fabric Adapter. This networking device accelerates data flow between the company's AI chips.

Transferring information between two different servers involves numerous computational operations, some of which are performed by the servers' operating systems. AWS's Elastic Fabric Adapter bypasses the operating system, allowing network traffic to reach its destination more quickly.

The device manages traffic with the help of the open-source networking framework libfabric. This software is not only suitable for powering AI models but also for other demanding applications, such as scientific simulations.

AWS expects to complete Project Rainer's construction by next year. Once operational, the system will become one of the world's largest compute clusters for training AI models. AWS claims it will deliver more than five times the performance of the systems Anthropic has used so far to develop its language models.

AWS announced Project Rainer approximately a year after revealing another large-scale AI cluster initiative.

The system, named Project Ceiba, utilizes Nvidia chips instead of Trainium2 processors. Initially planned to equip the supercomputer with 16,384 Nvidia GH200 GPUs, AWS switched to configuring 20,736 Blackwell B20 chips in March last year, anticipating a sixfold performance increase.

Project Ceiba will support Nvidia's internal engineering projects. The chip manufacturer plans to use the system for projects in areas such as language model research, biology, and autonomous driving.

Dia Browser

Dia Browser - AI browser for an improved web browsing experience

Visual Electric

Visual Electric - AI image generator for collaborative design projects

Marvel

Marvel - Interactive prototyping tool for seamless team collaboration

Coolors

Coolors - Generate custom color palettes

Khroma

Khroma - AI tool for generating personalized color palettes

Kiro AI

Kiro AI - AI IDE transforming prompts into actionable specs

Watermark Remover

Watermark Remover - AI tool for automatic watermark removal

RECENT AI TOOLS

Yes Chat

Dia Browser

Visual Electric

Marvel

Coolors

RECENT AI NEWS

AWS Launches Vector Capabilities on Amazon S3

Google Launches Opal, a No-Code Tool for Building AI Mini-Apps

Qwen Launches Qwen3-Coder: Large Agent-Based Coding Model with Open Tools

New ChatGPT Agent Enables Booking, Browsing, and Form Filling—But Trust It Carefully

Trump Reveals Consideration of Splitting NVIDIA During AI Plan Speech

Cognition's AI Developer 'Devin' Eyes $10 Billion Valuation

Leena AI Introduces Voice-Functional AI 'Colleague' to Enhance Workplace Collaboration

Elon Musk Announces AI-Powered Reboot of Vine

RECENT AI TOOLS