Data Discovery Tech Talks by Lyft and Shopify

Jan 12, 12:00PM PDT(08:00PM GMT).
  • Free 291 Attendees
This event is hosted by San Francisco Big Analytics meetup group

The event is dedicated to data discovery & data management - what it means, why you should improve it and how you should go about it, including two examples of how others have done it; Amundsen from Lyft and Artifact from Shopify.

Talk #1: From discovery to trusting data, by Lyft
At Lyft, we have made our analysts and data scientists over 30% more productive by making it easier to discover data. This talk gives a quick overview of Amundsen and then goes into detail on how we have tried both automated and curated metadata to showcase what’s trusted and not in Amundsen. It will dive deep into linking the Airflow DAG which produced the data (task level lineage), linking what and how many dashboards are built from a given data set (table level lineage), as well as SLAs and historical landing times to give users signal into what’s trusted.

Talk #2: How We’re Solving Data Discovery Challenges at Shopify
Shopify developed an in house solution to their data discovery and management challenges. This talk will address the data discovery problems faced, how users were impacted, the solution and approach, and finally trade-offs that were evaluated throughout the building process.

Mark(Lyft), Ranko(Shopify)

Mark Grover
Co-founder of Stemma. He is the co-creator of the open-source data discovery and metadata engine, Amundsen and a co-author of Hadoop Application Architectures book. Mark was previously a developer on Apache Spark at Cloudera and is a committer and PMC member on a few open-source ASF projects

Ranko Cupovic
Senior Product Manager in Shopify Data Science and Engineering organization, where he works on building data products

The event ended.
Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.