NVIDIA: Accelerate Spark With RAPIDS For Cost Savings


Jun 22, 12:00 PM PDT
  • Virtual SF Big Analytics
  • 29 RSVP
Description
Speaker

GPUs used with Apache Spark are leveraged to speed up machine learning (ML) model training and inference. Data preparation stages are traditionally run on CPUs. The RAPIDS Accelerator for Apache Spark is a plugin jar that takes advantage of Apache Spark 3.x's ability to schedule on GPUs. The RAPIDS Accelerator replaces CPU expressions in a physical plan with GPU equivalents for dataframe operations. Code change is not required, making transition to GPUs seamless.

We'll give an overview of what the RAPIDS Accelerator is, how it works, and benefits from using the accelerator. We will discuss benchmarks showing the performance and cost benefits of leveraging GPUs for Spark ETL processing. We'll showcase a user tool that will help estimate speedups and cost savings.

Sameer Raheja (NVIDIA)

Sameer Raheja is Senior Director of Engineering at Nvidia. He has worked on internet scale data pipelines for over a decade. Prior to Nvidia he worked at Yahoo building big data applications on Apache Hadoop, Hive and Storm. He has also worked on data pipelines for autonomous vehicles. Sameer holds BS and MS degrees from MIT.
The event ended.
Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.
Contact Organizer