Learn AI with global developers community

Best practices towards a production-ready pipeline with Apache Beam

May 13, 10:00 AM PDT

Virtual AICamp
340 RSVPs

Introducing BeamLearningMonth in May 2020! In collaboration with Google cloud team, we host a series of practical introductory sessions to Apache Beam!

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

This is session 2 of the series:
Data Engineering is a very interesting field, with lots of new technologies, and opportunities. Unfortunately, it takes a long time to master, and there arenot many resources for intermediate practitioners
In this talk, we will take a public dataset and a concept, and we will build an Apache Beam pipeline that is stable and ready to run in production. We will walk through the workflow of starting the project in an IDE, writing and organizing pipeline code, as well as writing tests, and running them. You can adapt this model for your own pipeline, and I will be happy to answer your questions!

For more talks on Apache Beam, join and watch our Session 1 on May 6th, 10am PST. Link

Contact Organizer