PuppyGraph Launches Zero-ETL Query Engine to Simplify Integration of Graph and Relational Databases

2024-11-08

As businesses continue to heavily invest in advanced analytics and large language models (LLM), graph technology has emerged as a favored option for constructing data infrastructures. This technology enables users to comprehend intricate associations within datasets, relationships that are typically obscure in conventional relational databases.

Nevertheless, when it comes to maintaining and querying graph databases alongside traditional relational databases, companies face numerous challenges and substantial costs. Recently, PuppyGraph, a startup based in San Francisco founded by ex-Google and LinkedIn employees, secured $5 million in funding. The company aims to address these issues with the world's first and only zero-ETL query engine. This engine enables users to query existing relational data as a unified graph without the need for a separate graph database or the cumbersome ETL (Extract, Transform, Load) processes.

Launched in March 2024, the engine has been adopted by numerous companies to streamline their data analysis processes. Its permanently free developer version has seen a monthly download growth rate of up to 70%.

The architecture of graph databases is akin to drawing on a whiteboard, where all information is stored in nodes (representing entities, individuals, and concepts) along with relevant contexts and their interconnections. Leveraging this graph structure, users can identify complex patterns and relationships that are difficult to detect in traditional relational databases (through SQL queries). This facilitates the deployment of algorithms for rapid implementation of AI/machine learning, fraud detection, customer journey mapping, and network risk management use cases.

Currently, the sole approach to utilizing graph technology involves setting up a separate native graph database and keeping it synchronized with the source database. Although this task may appear straightforward, it is actually complex; teams must establish intricate and resource-intensive ETL pipelines to migrate data into graph storage. This process can cost millions of dollars and take months, preventing users from executing essential business queries.

Moreover, once the database is established, ongoing management is required, which further escalates costs and introduces scalability issues over the long term.

To address these challenges, former Google and LinkedIn employees Liu Weimo, Huang Lei, and Xu Danfeng co-founded PuppyGraph. Their objective is to provide a solution that allows users to query existing relational databases and data lakes as graphs without the need for data migration.

This approach enables data analyzed through SQL queries to also be examined as graphs, thereby facilitating quicker insights. It is particularly beneficial in scenarios where data possesses multi-level and complex associations, such as supply chains or cybersecurity.

Jenney Wu, one of PuppyGraph's co-founders, stated that compared to traditional SQL queries, graph queries are more efficient in handling multi-level relationships. Graph queries swiftly traverse these connections through the paths in the graph, regardless of the connection depth.

Wu further mentioned that PuppyGraph completely eliminates the need for complex ETL configurations, enabling a transition from deployment to querying in approximately 10 minutes. Users merely need to connect the tool to their chosen data sources. Once connected, the tool automatically generates graph schemas and queries tables within the graph model. Additionally, the engine's distributed architecture allows it to handle extremely large datasets and complex multi-hop queries.

It can integrate with all major data lakes, including Google BigQuery and Databricks, to perform accelerated graph analyses while maintaining low costs.

“Separating storage and compute architectures means that low cost is one of PuppyGraph's greatest advantages. Without storage costs, the engine directly queries data from the user's existing data lakes/warehouses. It offers the flexibility to scale compute resources on demand, allowing adjustments as needed to efficiently handle fluctuating workloads without causing resource contention or performance degradation,” Wu added.

Although the company is less than a year old, it has already partnered with numerous businesses, including Coinbase, Clarivate, Dawn Capital, and Prevelant AI, achieving significant results.

One company reduced total ownership costs by over 80% after migrating from a legacy graph database system to PuppyGraph. A leading financial trading platform was able to execute 5-hop path queries between Account A and Account B with approximately one billion edges in under 3 seconds. Prior to using PuppyGraph, their self-built SQL-based solution could not perform queries beyond 3 hops and suffered from batch processing timeouts.

With this funding, the company plans to accelerate product development, expand its team, and increase market influence by bringing its zero-ETL graph query engine to more organizations worldwide.

According to Gartner, the graph technology market is projected to reach $3.2 billion by 2025, with a compound annual growth rate of 28.1%. Other players in this field include Neo4j, AWS Neptune, Aerospike, and ArrangoDB.