
In this tech talk, I will show some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and data feed.
In the modern data processing approach, we utilize several highly scalable open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar.
From there we build streaming ETL with Apache Spark, and enhance events with Pulsar Functions for ML and enrichment.
We build continuous queries against our topics with Flink SQL for aggregations, real-time alerts, and Delta Lake population