Inspired by this story, we built two demonstration streaming pipelines for ingesting, storing, and visualizing public IoT data (Tidal data from NOAA, the National Oceanic and Atmospheric Administration) using multiple open source technologies. The common ingestion technologies were Apache Kafka, Apache Kafka Connect, and Apache Camel Kafka Connector, supplemented with Prometheus and Grafana for monitoring. The initial experiment used Open Distro for Elasticsearch and Kibana as the target storage and visualisation technologies, while the second experiment used PostgreSQL and Apache Superset.
In this talk we introduce each technology and the pipeline architecture, and walk through the steps followed, challenges encountered, and solutions used to build reliable and scalable pipelines, and visualize the results (including Tidal periods, ranges and locations). We compare and contrast the two approaches, focussing on exception handling, scalability, performance and monitoring, and the pros and cons of the two visualization technologies (Kibana and Superset).