Big Data Analytics: An Interactive Introduction to Apache Beam


May 06, 10:00 AM PDT
  • Virtual
  • 400 RSVP
Description
Speaker
Introducing BeamLearningMonth in May 2020! In collaboration with Google cloud team, we host a series of practical introductory sessions to Apache Beam!

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

This is session 1 of the series:
In this talk, we will be introducing Apache Beam using Jupyter Notebooks by live coding both a batch and streaming pipeline using publicly available COVID-19 data.

For more talks on Apache Beam, join the Session 2 on May 13th, 10am PST. Link

Samuel and Ning

Samuel Rohde is a Software Engineer at Google and has been working for the Cloud Dataflow team for the past 5 years. He graduated from UIUC. Sam has been contributing to the Apache Beam source code for the past couple of years.

Ning Kangis a member of the Google Cloud Dataflow team, and has been contributing to the Apache Beam Interactive Notebook OSS project. Before that, he was a software engineer in the Google Store team where he helped with 3 large hardware (pixel phone and etc.) sales events. Before joining Google, he worked in the EMR software industry

The event ended.
Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.
Contact Organizer