Discovering The Structure Of Visual Categories From Weak Annotations
In order for an automatic system to answer queries like "birds with short beaks" or "planes with engines on their nose" one would expect underlying representations of these categories via their parts and attributes. However, building such models is challenging because exhaustive labeling of these parts and attributes can be very expensive. In this talk I'll present two projects that aim to discover these from weak annotations that can be effectively collected via crowd-sourcing. The first aims to discover parts that represent discriminative patterns from sparse landmark annotations. These parts which we call poselets, examples of which include faces for humans, or wheels for bicycles, can serve as a basis for a range of recognition tasks such as detection, segmentation, pose estimation and attribute recognition. I'll also describe some recent work that simplifies this annotation task even further, extending it to categories for which landmarks are hard to define. The second work aims to discover describable attributes for fine-grained discrimination. We propose a novel annotation task which consists of asking annotators to describe the differences between images and develop a structured topic model to analyze these descriptions. The output of this are clusters of words into parts and modifiers, and relations between clusters that represent attributes.
Subhransu Maji received the BTech degree in computer science and engineering from the Indian Institute of Technology, Kanpur, in 2006, and the PhD degree in computer science from the University of California, at Berkeley, in 2011. He is currently a research assistant professor at TTI Chicago. Earlier he was an intern in Google's image search group and INRIA's LEAR group, and a visiting researcher at Microsoft Research India and the CLSP center at Johns Hopkins University. He received the medal for the best graduating student in the computer science department from IIT Kanpur. He was one of the recipients of the Google graduate fellowship in 2008 and a best paper award at ICIF 2009. His primary interests are in computer vision and machine learning, with focus on representations and efficient algorithms for visual recognition.