Modern Data Processing Pipelines


Jul 26, 12:00 PM PDT
  • Virtual SF Big Analytics
  • 125 RSVP
Description
Speaker

In this tech talk, I will show some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and data feed.

In the modern data processing approach, we utilize several highly scalable open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar.
From there we build streaming ETL with Apache Spark, and enhance events with Pulsar Functions for ML and enrichment.
We build continuous queries against our topics with Flink SQL for aggregations, real-time alerts, and Delta Lake population

Timothy Spann

Timothy Spann
Developer Advocate, StreamNative, and former Principal DataFlow Field Engineer at Cloudera, Hortonworks and Pivotal.
David Kjerrumgaard
Apache Pulsar Committer | Author of Pulsar In Action. Former Principal Software Engineer on Splunk’s messaging team responsible for Splunk’s internal Pulsar-as-a-Service platform
The event ended.
Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.
Contact Organizer