Categories: Technology

DataPelago aims to save enterprise $ via universal data processing

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

As data continues to be key to business success, enterprises are racing to drive maximum value from the information in hand. But the volume of enterprise data is growing so quickly — doubling every two years — that the computing power to process it in a timely and cost-efficient manner is hitting a ceiling.

California-based DataPelago aims to solve this with a “universal data processing engine” that allows enterprises to supercharge the performance of existing data query engines (including open-source ones) using the power of accelerating computing elements such as GPUs and FPGAs (Fixed Programming Gate Arrays). This enables the engines to process exponentially increasing volumes of complex data across varied formats.

The startup has just emerged from stealth but is already claiming to deliver a five-fold reduction in query/job latency while providing significant cost benefits. It has also raised $47 million in funding with the backing of multiple venture capital firms, including Eclipse, Taiwania Capital, Qualcomm Ventures, Alter Venture Partners, Nautilus Venture Partners and Silicon Valley Bank.

Addressing the data challenge

More than a decade ago, structured and semi-structured data analysis was the go-to option for data-driven growth, providing enterprises with a snapshot of how their business was performing and what needed to be fixed.

The approach worked well, but the evolution of technology also led to the rise of unstructured data — images, PDFs, audio and video files – within enterprise systems. Initially, the volume of this data was small, but today, it accounts for 90% of all information created (far more than structured/semi-structured) and is very critical for advanced enterprise applications like large language models.

Now, as enterprises are looking to mobilize all their data assets, including large volumes of unstructured data, for these use cases, they are running into performance bottlenecks and struggling to process them timely and cost-effectively.

The reason, as DataPelago CEO Rajan Goyal says, is the computing limitation of legacy platforms, which were originally designed for structured data and general-purpose computing (CPUs).

“Today, companies have two choices for accelerated data processing…Open-source systems offered as a managed service by cloud service providers have smaller licensing fees but require users to pay more for cloud infrastructure compute costs to reach an acceptable level of performance. On the other hand, proprietary services (built with open-source frameworks or otherwise) can be inherently more performant, but they have much higher licensing fees. Both choices result in higher total cost of ownership (TCO) for customers,” he explained.

To address this performance and cost gap for next-gen data workloads, Goyal started building DataPelago, a unified platform that dynamically accelerates query engines with accelerated computing hardware like GPUs and FPGAs, enabling them to handle advanced processing needs for all types of data, without massive increase in TCO.

“Our engine accelerates open-source query engines like Apache Spark or Trino with the power of GPUs resulting in a 10:1 reduction in the server count, which results in lower infrastructure cost and lower licensing cost in the same proportion. Customers see disruptive price/performance advantages, making it viable to leverage all the data they have at their disposal,” Goyal said.

At the core, DataPelago’s offering uses three main components – DataApp, DataVM and DataOS. The DataApp is a pluggable layer that allows integration of DataPelago with open data processing frameworks like Apache Spark or Trino, extending them at the planner and executor node level.

Once the framework is deployed and the user runs a query or data pipeline, it is done unmodified, with no change required in the user-facing application. On the backend, the framework’s planner converts it into a plan, which is then taken by DataPelago. The engine uses an open-source library like Apache Gluten to convert the plan into an open-standard, Intermediate Representation called Substrait. This plan is sent to the executor node where DataOS converts the IR into an executable Data Flow Graph (DFG).

Finally, the DataVM evaluates the nodes of the DFG and dynamically maps them to the right computing element – CPU, FPGA, Nvidia GPU or AMD GPU – based on availability or cost/performance characteristics. This way, the system redirects the workload to the most suitable hardware available from hyperscalers or GPU cloud providers for maximizing performance and cost benefits.

Significant savings for early DataPelago adopters

While the technology to dynamically accelerate query engines with accelerated computing is new, the company is already claiming it can deliver a five-fold reduction in query/job latency with a two-fold reduction in TCO compared to existing data processing engines.

“One company we’re working with was spending $140M on one workload, with 90% of this cost going to compute. We are able to decrease their total spend to less than $50M,” Goyal said.

He did not share the total number of companies working with DataPelago, but he did point out that the company is seeing significant traction from enterprises across verticals such as security, manufacturing, finance, telecommunications, SaaS and retail. The existing customer base includes notable names such as Samsung SDS, McAfee and insurance technology provider Akad Seguros, he added.

“DataPelago’s engine allows us to unify our GenAI and data analytics pipelines by processing structured, semi-structured, and unstructured data on the same pipeline while reducing our costs by more than 50%,” André Fichel, CTO at Akad Seguros, said in a statement.

As the next step, Goyal plans to build on this work and take its solution to more enterprises looking to accelerate their data workloads while being cost-efficient at the same time.

“The next phase of growth for DataPelago is building out our go-to-market team to help us manage the high number of customer conversations we’re already engaging in, as well as continue to grow into a global service,” he said.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

News Today

Next Liverpool vs Bologna LIVE: Champions League result and reaction after Mac Allister and Salah on target »

Previous « Tim Walz and JD Vance meet for vice presidential debate as both campaigns aim to win voters in election's final stretch

Kareena Kapoor’s Next Untitled Film With Meghna Gulzar Gets Prithviraj Sukumaran On Board

Kareena Kapoor is working with Raazi director Meghna Gulzar for her next film. The project,…

2 weeks ago

Trending now

Purdue basketball freshman Daniel Jacobsen injured vs Northern Kentucky

2024-11-09 15:00:03 WEST LAFAYETTE -- Daniel Jacobsen's second game in Purdue basketball's starting lineup lasted…

2 weeks ago

Trending now

Rashida Jones honors dad Quincy Jones with heartfelt tribute: ‘He was love’

2024-11-09 14:50:03 Rashida Jones is remembering her late father, famed music producer Quincy Jones, in…

2 weeks ago

Trending now

Nosferatu Screening at Apollo Theatre Shows Student Interest in Experimental Cinema – The Oberlin Review

2024-11-09 14:40:03 A silent German expressionist film about vampires accompanied by Radiohead’s music — what…

2 weeks ago

Health

What Are Adaptogens? Find Out How These 3 Herbs May Help You Tackle Stress Head-On

Let's face it - life can be downright stressful! With everything moving at breakneck speed,…

2 weeks ago

Technology

The new Mac Mini takes a small step towards upgradeable storage

Apple’s redesigned Mac Mini M4 has ditched the previous M2 machine’s SSD that was soldered…

2 weeks ago

DataPelago aims to save enterprise $ via universal data processing

Addressing the data challenge

Significant savings for early DataPelago adopters

Recent Posts

Kareena Kapoor’s Next Untitled Film With Meghna Gulzar Gets Prithviraj Sukumaran On Board

Purdue basketball freshman Daniel Jacobsen injured vs Northern Kentucky

Rashida Jones honors dad Quincy Jones with heartfelt tribute: ‘He was love’

Nosferatu Screening at Apollo Theatre Shows Student Interest in Experimental Cinema – The Oberlin Review

What Are Adaptogens? Find Out How These 3 Herbs May Help You Tackle Stress Head-On

The new Mac Mini takes a small step towards upgradeable storage