UMass Machine Learning and Friends Lunch | Main / Grounding Deep Models Of Visual Data

Abstract: Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this talk I will present: 1) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 2) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidence/prediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 3) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network's classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks.

Bio: Sarah is a Postdoctoral Associate in the Image and Video Computing Group working with Prof. Stan Sclaroff and Prof. Kate Saenko. Sarah first joined the Image and Video Computing Group in 2013 where she then completed her PhD with Prof. Stan Sclaroff. She is a recipient of the IBM PhD Fellowship and the Hariri Graduate Fellowship. Her research interests lie in the intersection of Computer Vision and Machine Learning.