
GPUs used with Apache Spark are leveraged to speed up machine learning (ML) model training and inference. Data preparation stages are traditionally run on CPUs. The RAPIDS Accelerator for Apache Spark is a plugin jar that takes advantage of Apache Spark 3.x's ability to schedule on GPUs. The RAPIDS Accelerator replaces CPU expressions in a physical plan with GPU equivalents for dataframe operations. Code change is not required, making transition to GPUs seamless.
We'll give an overview of what the RAPIDS Accelerator is, how it works, and benefits from using the accelerator. We will discuss benchmarks showing the performance and cost benefits of leveraging GPUs for Spark ETL processing. We'll showcase a user tool that will help estimate speedups and cost savings.