Google Cloud Upgrades Kubernetes Engine to Meet Large Language Model Demands

2024-11-14

As the parameter size of generative AI models continues to grow, some models have reached the 2 trillion parameter level, leading to a surge in computational and storage demands for large-scale language models.

Recently, Google Cloud announced an upgrade to its Kubernetes Engine (GKE) to meet the demands of larger-scale models. Now, GKE supports clusters with up to 65,000 nodes, a significant increase from the previous limit of 15,000 nodes. This upgrade provides ample scale and computational power to handle some of the world's most complex and resource-intensive AI workloads.

Training these models with trillions of parameters requires clusters with over 10,000 nodes to run AI accelerator workloads. Parameters, as variables within AI models, control their behavior and predictive capabilities. Increasing the number of variables can enhance the model's prediction accuracy. These parameters function like knobs or switches that developers can adjust to optimize performance and precision.

The Senior Director of Kubernetes and Serverless Products at Google Cloud stated that the global scale of large language models (LLMs) continues to expand, requiring exceptionally large clusters for efficient operation. These clusters must not only be large in scale but also reliable, scalable, and capable of addressing the challenges posed by large LLM training workloads.

GKE is Google's managed Kubernetes service designed to simplify the operation of containerized environments. GKE can automatically add or remove hardware resources, such as dedicated AI chips or GPUs, based on changing workload demands. Additionally, it handles Kubernetes updates and other maintenance tasks.

The new 65,000-node cluster can manage AI models distributed across 250,000 Tensor Processing Units (TPUs), which are specialized AI processors designed to accelerate machine learning and generative AI workloads. This represents a fivefold increase in the number of TPU chips per GKE cluster, up from the previous 50,000.

This upgrade significantly enhances the reliability and efficiency of running large-scale AI workloads. For extensive AI training and inference, increased scale is crucial as Kubernetes allows users to handle hardware-based failures without worrying about downtime. Additionally, the extra capacity enables more model iterations to be run in a shorter time frame, thereby accelerating job completion.

To achieve this upgrade, Google Cloud is migrating GKE from the open-source etcd (a distributed key-value store) to a more robust system based on Google's distributed database, Spanner. This will enable GKE clusters to handle nearly limitless scale and offer lower latency.

Google has also made significant improvements to the GKE infrastructure, greatly increasing its scaling speed to help customers meet demands more rapidly. Currently, a single cluster can run five jobs, each reaching the record scale previously achieved by Google Cloud in training LLMs.

The driving factors behind this upgrade include customer focus on AI systems, the widespread adoption of AI within systems, and the rapid growth of AI across the industry. Google Cloud customers, including cutting-edge AI model developers like Anthropic PBC, have been leveraging GKE's clustering capabilities to train their models.

Reportedly, over the past year, the usage of TPUs and GPUs on GKE has increased by 900%. This growth is driven by the rapid advancement of AI, and AI is expected to account for the vast majority of Kubernetes Engine usage in the future.