The identification of anomalous overdensities in data --- group or collective anomaly detection --- is a rich problem with a large number of real world applications. However, it has received relatively little attention in the broader ML community, as compared to point anomalies or other types of single instance outliers. One reason for this is the lack of powerful benchmark datasets.
In this talk, I first explain how, after the Nobel-prize winning discovery of the Higgs boson, unsupervised group anomaly detection has become a new frontier of fundamental physics (where the motivation is to find new particles and forces). Then I will discuss a realistic synthetic benchmark dataset (LHCO2020) for the development of group anomaly detection algorithms. Finally, I will introduce multiple statistically-sound techniques for unsupervised group anomaly detection, and demonstrate their performance on the LHCO2020 dataset.
Staff Scientist at Lawrence Berkeley National Laboratory(LBNL), leading the cross-cutting Machine Learning for Fundamental Physics group in the Physics Division. Ph.D. in Physics; Ph.D. minor in Statistics from Stanford University in 2016, Chamberlain Fellowship at LBNL 2016-2020.