Microsoft Deploys the World's First GB300 Supercluster for OpenAI AI NEWS

Home
AInews
Microsoft Deploys the World's First GB300 Supercluster for OpenAI

Microsoft Deploys the World's First GB300 Supercluster for OpenAI

2025-10-10

Microsoft and NVIDIA have announced that Azure is now operating NVIDIA’s GB300 NVL72, the first large-scale production cluster specifically designed to advance larger and more powerful “inference” models and accelerate service deployment. This system integrates rack-level memory, fifth-generation NVLink, and 800Gb/s networking, allowing dozens of cabinets to function as a single massive accelerator.

The GB300 NVL72 is NVIDIA’s rack-scale platform built for the era of “AI inference,” where models demand more compute resources during inference to handle multi-step tasks. Each rack connects 72 Blackwell Ultra GPUs and 36 Grace CPUs into a unified 72-GPU NVLink domain, delivering a shared fast memory pool and 130 TB/s of intra-rack bandwidth to support massive contexts and extended chain-of-thought reasoning.

Azure’s cluster interconnects racks via NVIDIA’s Quantum-X800 InfiniBand, offering up to 800 Gb/s network throughput per GPU. This enables OpenAI to scale decoding and prefill operations across racks while maintaining low latency—crucial for interactive agents and long-context workloads.

A single NVL72 rack aggregates approximately 21 TB of HBM3e memory across these GPUs and, with CPU-GPU coherent memory, can reach around 40 TB of “fast” memory. This memory capacity supports larger models and extended prompts without frequent offloading, resulting in higher tokens per second and fewer pauses.

In benchmark tests, NVIDIA’s GB300 NVL72 recently set records in the new inference tests of MLPerf Inference v5.1, including higher DeepSeek-R1 throughput than previous-generation Blackwell-based clusters. The platform is also optimized for FP4 formats (e.g., NVFP4) and Dynamo-style disaggregated services—choices aimed at reducing inference costs for large-scale models.

At this scale, power and grid stability are becoming top-tier engineering concerns. The GB300 introduces rack power units with onboard energy storage and coordinated power smoothing, reducing peak demand by up to 30%—especially beneficial when thousands of GPUs accelerate or decelerate in sync. More grid-aware designs like this are expected in hyperscale AI builds.

This initiative aligns with a broader capacity strategy. Microsoft has been securing supply—both within its own data centers and through “new cloud” partners—to support GPU-intensive projects even as it prepares to roll out the new GB300 globally. This procurement approach, combined with Azure’s GB200 deployment, reflects Microsoft’s plan to shorten training cycles and bring larger models to market faster.

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

Giskard AI

AI platform for identifying model vulnerabilities

SnapCalorie

AI photo calorie tracker for accurate nutrition

Supio

**AI legal assistant for personal injury cases**

TTS Maker

Free AI tool for converting text to speech

HireEZ

AI recruiting platform for easier hiring processes

RECENT AI TOOLS

Miko

Comet

Mirelo AI

Giskard AI

SnapCalorie

RECENT AI NEWS

Microsoft Deploys the World's First GB300 Supercluster for OpenAI

Unitree R1 Bipedal Humanoid Robot Ranks on TIME's 2025 Best Inventions List

Dishwashing and laundry "housework buddy" is here! Figure 03 humanoid robot: 1.68 meters tall, 5-hour battery life

Sora Reaches 1 Million Downloads Faster Than ChatGPT

Google Launches Gemini Enterprise: Unified AI Platform for Businesses

Figma Leverages Google's Gemini to Accelerate Enterprise AI in Its Design Platform

Intel Launches Panther Lake, the First Core Ultra Based on 18A Process

Amazon Launches Quick Suite, Introducing AI Agents to the Enterprise Workplace

RECENT AI TOOLS