Cloudflare Acquires AI Deployment Startup Replicate

2025-11-19

Cloudflare has acquired Replicate, a startup that offers software designed to streamline the deployment of artificial intelligence models into production environments.

The two companies announced the acquisition today but did not disclose financial details. Prior to the deal, Replicate had raised more than $23 million from investors including Y Combinator and Sequoia Capital.

Large language models (LLMs) rely on various supporting components to function effectively. These typically include libraries such as CuDNN from NVIDIA, which provides foundational building blocks like attention mechanisms. AI models also commonly depend on Python implementations, the preferred language for developing AI workloads.

Manually configuring all necessary components for an LLM can take hours. To accelerate this process, software teams often package LLMs along with their dependencies into containers. This allows developers to deploy pre-built containers in production instead of setting up each component individually.

San Francisco–based Replicate maintains an AI model catalog featuring containerized versions of over 50,000 models. The company uses Cog—an internal tool it open-sourced in 2019—to create these containers. While packaging models and their dependencies significantly speeds up deployment, the process can still be time-consuming. Cog automates much of this workflow.

Replicate enables customers to deploy its containerized models on a managed cloud platform that also supports custom LLMs, eliminating the need for developers to manage infrastructure directly. Pricing is usage-based.

Cloudflare plans to migrate Replicate’s platform onto its own infrastructure—a move expected to enhance both reliability and performance. Additionally, Cloudflare intends to leverage the acquired technology to strengthen its Workers AI service.

Similar to Replicate, Cloudflare Workers allows developers to run code in the cloud without managing underlying hardware, which is distributed across a global network of data centers. When a user sends a request to a Workers application, Cloudflare routes it through the nearest data center to minimize latency.

Workers AI is a specialized version of this platform optimized for machine learning tasks. Cloudflare aims to expand its ready-to-use AI model catalog by integrating Replicate’s library of containerized models. The company also plans to introduce support for running custom LLMs and fine-tuning open-source models.

Further development efforts will focus on enhancing Cloudflare’s AI Gateway service. This tool enables developers to cache responses to frequently asked user prompts, avoiding redundant generation of identical outputs. AI Gateway also functions as an observability solution for LLMs.

“We’re deeply integrating our unified inference platform with AI Gateway to provide you with a single control plane for observability, prompt management, A/B testing, and cost analysis across all your models—whether they run on Cloudflare, Replicate, or any other provider,” wrote Rita Kozlov, Vice President at Cloudflare, and Ben Firshman, CEO of Replicate, in a joint blog post.