Cloudflare has recently unveiled the public beta of its Cloudflare Data Platform—a managed solution leveraging open standards like Apache Iceberg to ingest, store, and query analytical data tables.
Earlier this year, Cloudflare launched the public beta of R2 Data Catalog, a managed Apache Iceberg catalog built on top of its R2 object storage. The company has now unified Cloudflare Pipelines, R2 Data Catalog, and R2 SQL into a cohesive offering: the Cloudflare Data Platform. Micah Wylde, Principal Engineer at Cloudflare; Alex Graham, Senior Systems Engineer; and Jérôme Schneider, Software Engineer, explained:
Analytical data is essential for modern businesses—it reveals user behavior, measures organizational performance, and flags potential issues. Yet traditional data infrastructure is costly and complex, often requiring dedicated cloud resources and in-house expertise. We built the Cloudflare Data Platform to be simple, accessible to everyone, and affordable—priced purely on usage.
Cloudflare Pipelines captures events sent via Workers or HTTP, processes them using SQL, and stores the output either in Iceberg tables or as files in R2. The R2 Data Catalog manages Iceberg metadata and now also handles routine maintenance tasks like compaction to accelerate query performance. Meanwhile, R2 SQL serves as a distributed, serverless query engine designed for petabyte-scale datasets stored in R2. Micah Wylde, former co-founder and CEO of Arroyo, added on LinkedIn:
Six months ago, Arroyo was acquired by Cloudflare. At the time, many wondered—why would Cloudflare need a stream processing engine? The answer is clear: we’re building a data platform. Just as the Cloudflare Developer Platform empowers millions of developers with fully serverless infrastructure to build, operate, and scale applications, the Cloudflare Data Platform applies the same philosophy to make analytical data infrastructure universally accessible.
While SQL transformations are already powerful for use cases like schema enforcement, data normalization, or redacting sensitive information before storage, Pipelines currently supports only stateless transformations. Looking ahead, Cloudflare plans to integrate more of Arroyo’s stateful processing capabilities to enable features such as aggregations, incrementally updated materialized views, and joins. Jamie Lord, Solutions Architect at CDS UK, highlighted a key advantage of the new platform—Cloudflare’s signature “zero egress fees” for data access:
Zero egress fees fundamentally reshape the economics of data warehousing. Cloudflare’s new data platform leverages this advantage to challenge AWS and Google’s dominance in analytics workloads. It addresses a simple but costly reality: companies hemorrhage money on data transfer fees. A single petabyte-scale operation can incur millions annually just moving analytical data across regions. Cloudflare eliminates that cost entirely.
Joel Hatmaker, Engineering Director at McGaw.io, remarked:
If you’re already using Cloudflare for performance and security, the Cloudflare Data Platform becomes an extremely compelling addition.
According to Cloudflare, upcoming features—including integration with Logpush, user-defined functions via Workers, and support for aggregations and joins in R2 SQL—are slated for release in the first half of 2026.
A hands-on tutorial is available to help users build an end-to-end analytics data system using Pipelines, R2 Data Catalog, and R2 SQL. During the public beta phase, Pipelines, R2 Data Catalog, and R2 SQL are offered at no charge, though standard rates apply for storage and compute incurred by queries.