One of the key bottlenecks in building ML systems is creating and managing the massive training datasets that today’s models learn from. This talk outlines work on Snorkel, a framework for building and managing training dataset, and its effect on overall AI and ML application development.
Programmatic operators in this framework that let users build and manipulate training datasets include labeling functions for labeling unlabeled data, transformation functions for expressing data augmentation strategies, and slicing functions for partitioning and structuring training datasets. These operators allow domain experts to specify ML models via noisy operators over training data, leading to applications that can be built in hours or days rather than months or years.
This programmatic approach also leads to more systematic and error-analysis driven iterations to develop and monitor AI and ML applications for real-world problems.
Paroma Varma (Snorkel AI)
Co-founder at Snorkel AI. She received her Ph.D from Stanford University and her B.S. in EECS from UC Berkeley. Her research revolved around making machine learning easily usable for domain experts who do not have access to the massive datasets required for training complex models and applying these methods across areas like medical imaging and autonomous driving. She is a recipient of the National Science Foundation and Stanford Graduate Fellowships, and the Arthur M. Hopkins Academic Achievement and Outstanding Course Development and Teaching Awards