As big data analytics becomes more popular, we see many tools aiming to solve very large scale problems. However the focus should be the analytics itself, not "big" or "small". To achieve some uncommon/unrealistic goals, we see popular tools become difficult and tedious to use. More importantly, we are losing consistency between different solutions.
In this talk, we will discuss
1. Pain points of building an ETL and machine learning pipeline using existing popular frameworks
2. New way of thinking when you encounter such problems in your work
3. A new open source project Fugue at Lyft, to realize the idea in the live demo
4. case study in production at Lyft
Research scientist at Lyft in New York, where he works on problems at the intersection of causal inference and machine learning. He also works part time as an adjunct professor of mathematics at Columbia University, where he teach the course Introduction to Data Science in Industry.
Tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.