BigDataWire Exclusive Interview: DataPelago CEO on Launching the Spark Accelerator

BigDataWire Exclusive Interview: DataPelago CEO on Launching the Spark Accelerator

Apache Spark remains one of the most widely used engines for large-scale data processing, but it was built in an era when cloud infrastructure was mostly CPU-bound. Today’s cloud environments look very different. 

Organizations are running workloads across GPUs, FPGAs, and a range of specialized hardware, yet many open-source data systems haven’t adapted. As a result, teams are spending more on compute but not seeing the performance gains they expect.

DataPelago believes that can change. The company has launched a new Spark Accelerator that combines native execution with CPU vectorization and GPU support. Built on its Universal Data Processing Engine, DataPelago helps organizations run analytics, ETL, and GenAI workloads across modern compute environments without needing to rewrite code or pipelines.

According to the company, the Spark Accelerator works inside existing Spark clusters and does not require reconfiguration. It analyzes workloads as they run and chooses the best available processor for each part of the job, whether that is a CPU, a GPU, or an FPGA. The company says this can speed up Spark jobs by up to 10x while lowering compute costs by as much as 80%.

DataPelago Founder and CEO – Rajan Goyal

DataPelago Founder and CEO Rajan Goyal shared more details in an exclusive interview with BigDataWire, describing the Spark Accelerator as a response to the widening gap between data systems and modern infrastructure. “If you look at the servers in the public cloud today, they are not CPU-only servers. They are all CPU plus something,” Goyal said. “But many of the data stacks written last decade were built for single software environments, usually Java-based or C++-based, and only using CPU.”

The DataPelago Accelerator for Spark connects to existing Spark clusters using standard configuration hooks and runs alongside Spark without disrupting jobs. Once it is active, it analyzes query plans as they are generated and determines where each part of the workload should run, whether on CPU, GPU, or other accelerators. 

These decisions happen at runtime based on the available hardware and the specific characteristics of the job. “We’re not replacing Spark. We extend it,” Goyal said. “Our system acts as a sidecar. It hooks into Spark clusters as a plugin and optimizes what happens under the hood without any change to how users write code.”

Goyal explained that this kind of runtime flexibility is key to delivering performance without creating new complexity for users. “There is no one silver bullet,” he said. “All of them have different performance points or performance per dollar points. In our workload, there are different characteristics that you need.” By adapting to the hardware available in each environment, the system can make better use of modern infrastructure without forcing users to re-architect their pipelines.

That adaptability is already paying off for early users. A Fortune 100 company running petabyte-scale ETL pipelines reported a 3–4x improvement in job speed and cut its data processing costs by as much as 70%. While results vary by workload, Goyal said the savings are real and tangible. “Here is the cost reduction. That $100 will become either $60 or $40,” he said. “That is the actual benefit that the enterprise sees.”

(kkssr/Shutterstock)

Other early adopters have seen similar gains. RevSure, a major e-commerce company, deployed the Accelerator in just 48 hours and reported measurable improvements across its ETL pipeline, which processes hundreds of terabytes of data.

ShareChat, one of India’s largest social media platforms with more than 350 million users, saw job speeds double and infrastructure costs fall by 50% after adopting the Accelerator in production.

That adaptability is drawing attention beyond early customers. Orri Erling, co-founder of the Velox project, sees DataPelago’s work as a natural evolution of what open-source systems have accomplished on CPUs.

“Since its inception, Velox has been deeply focused on accelerating analytical workloads. To date, this acceleration has been oriented around CPUs, and we’ve seen the impact that lower latency and improved resource utilization have on businesses’ data management efforts,” Erling said. “DataPelago’s Accelerator for Spark, leveraging Nucleus for GPU architectures, introduces the potential for even greater speed and efficiency gains for organizations’ most demanding data processing tasks.”

The new Spark Accelerator builds directly on what DataPelago first introduced when it emerged from stealth in late 2024 with its Universal Data Processing Engine. At the time, the company described a virtualization layer that could route data workloads to the most suitable processor, without requiring any code changes. That early vision now forms the foundation for the performance improvements customers are reporting with the Spark Accelerator.

The Accelerator is available on both AWS and GCP, and organizations can also access it through the Google Cloud Marketplace. According to the company, the deployment takes minutes, not weeks, with no need to rewrite applications, swap out data connectors, or adjust security policies.

(KanawatTH/Shutterstock)

It integrates with Spark’s existing authentication and encryption protocols and includes built-in observability tools that allow teams to monitor performance in real time. That visibility, combined with plug-and-play integration, helps customers adopt the Accelerator without disrupting existing operations.

While initially focused on analytics and ETL, Goyal noted that demand is growing across AI and GenAI pipelines. “The compute footprint for these models is only going up,” he said. “Our goal is to help teams unlock that performance affordably without reinventing their infrastructure.”

As part of its next phase of growth, DataPelago recently appointed former SAP and Microsoft executive John “JG” Chirapurath as President. Chirapurath previously served as Executive Vice President and Chief Marketing & Solutions Officer at SAP, as well as Vice President of Azure at Microsoft. His addition signals the company’s push to scale adoption and deepen industry partnerships.

Related Items

From Monolith to Microservices: The Future of Apache Spark

Our Shared AI Future: Industry, Academia, and Government Come Together at TPC25

Snowflake Now Runs Apache Spark Directly

The post BigDataWire Exclusive Interview: DataPelago CEO on Launching the Spark Accelerator appeared first on BigDATAwire.