Congealing Erik G. Learned-Miller
Assistant Professor of Computer Science

140 Governors Drive, Office 248
Amherst, MA 01003

E-mail: elm at


National Science Foundation CAREER Award: Towards a Self-Taught Vision System

Research Plan | Publications | Projects

Research Plan
Using modern learning techniques, it is now possible to teach computers visual concepts through example-based learning. But this process is time consuming and arduous. Often large data sets must be manually collected. Machines typically do not take advantage of previously learned knowledge when performing new tasks. And when confronted with a new situation, systems fail catastrophically. The goal of this research is to make it dramatically easier to teach vision systems new skills, and to design machines that can learn tasks faster by leveraging previously learned knowledge. In short, the aim is to develop computer vision systems that are largely self-taught. More specifically, this research will focus on problems such as learning from a small number of examples; using previously learned knowledge to improve performance on novel tasks; learning properties of one object that can be used to make inferences about other objects; acquiring and organizing information autonomously; and leveraging interdisciplinary techniques to help relieve people from the burden of ``training'' computers.

These capabilities are taken for granted in human beings, but represent serious shortcomings in today's computer systems. A central tenet of this work is that it is impractical to train vision systems one problem at a time, acquiring large training sets and developing training paradigms for each task to be learned. There are many scenarios in which training data are severely limited (there are limited photos of Abraham Lincoln). And ideally, computer systems should be adaptive, and not have to be prepared for each new task, especially when these new tasks are similar to previous ones. Some specific areas of investigation include learning to recognize any particular car or face from a single example, simply by watching other cars or faces as they move about; developing software for robots to continously explore the visual world and the interactions between vision and the other senses; and learning to recognize typewritten text in a font never seen before, without ANY training examples of that font. The common thread in these efforts is that they relieve the burden on the teacher of the computer. The final goal is to develop computers that can be taught simply and rapidly, and that can explore on their own.

  • Front page of CAREER proposal

  • Main body of CAREER proposal


  • Dov Katz, Emily Horrell, Yuandong Yang, Brendan Burns, Thomas Buckley, Anna Grishkan, Volodymyr Zhylkovskyy, Oliver Brock, and Erik Learned-Miller.
    The UMass Mobile Manipulator UMan: An Experimental Platform for Autonomous Mobile Manipulation.
    In Workshop on Manipulation in Human Environments, at Robotics: Science and Systems, 2006.

  • Vidit Jain, Andras Ferencz and Erik Learned-Miller.
    Discriminative Training of Hyper-feature Models for Object Identification.
    To appear British Machine Vision Conference (BMVC), 2006.

  • Jerod Weinman and Erik Learned-Miller.
    Improving recognition of novel input with similarity.
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), to appear, 2006.

  • Ron Bekkerman, Mehran Sahami and Erik Learned-Miller.
    Combinatorial Markov Random Fields.
    To appear: European Conference on Machine Learning (ECML) 17, 2006.


  • Word recognition for the visually impaired

    Consider the problem of trying to identify the text on a custom painted store front sign. The letters may not belong to any standard font, the same letter may appear differently, and if we're looking at the sign from a severe angle, the entire word may be distorted. In the context of this grant, one could even say that we need to recognize letters of a new font from ZERO examples, since we are given 0 training examples for the new font. However, we believe we should be able to recognize a word even though we have no specific knowledge about a particular font.

    Recently, Jerod Weinman and I have published work which addresses the problem of recognizing novel types of text. We leverage the similarity among characters, in addition to their individual appearance, to classify characters in previously unseen fonts. This work integrates, in a consistent probabilistic framework, information about character appearance, character similarity, and a language model, to improve accuracies on this difficult "unseen font" problem. The work is described in the following paper:

          Jerod Weinman and Erik Learned-Miller.
          Improving recognition of novel input with similarity.
          In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Volume 1, pp. 308-315, 2006.

    While text recognition is considered an "easy" problem by many researchers in computer vision, there is still no software that can successfully recognize the full variety of words, as they appear in complex environments, such as on store fronts, street signs, or movie marquees.

  • Behavioral robotics

  • Recognition from one example

    How can I recognize a person when I have seen only a single picture of that person before? This is a particularly challenging recognition problem since the same person has so many variables affecting his or her appearance. The same person may appear with different facial expressions, hairstyles, or facial hair. They may be wearing glasses one day, but not the next. They may go to the beach and get a tan. We have been developing a method called "hyper-feature" recognition, originally conceived by Andras Ferencz at UC Berkeley, to solve the problem of face recognition from one example. Recently, Vidit Jain at UMass has improved this system using discriminative training techniques. This work is described in the following paper:

          Vidit Jain, Andras Ferencz and Erik Learned-Miller.
          Discriminative Training of Hyper-feature Models for Object Identification.
          Proceedings of the British Machine Vision Conference (BMVC), Volume 1, pp. 357-366, 2006.