Learning To Understand Natural Language With Less Supervision
Language understanding is the problem of mapping natural language text to a semantic representation connected with the real world. Many real-world applications depend on language understanding, including information extraction, robot command understanding, and natural language interfaces.
Obtaining labeled training data is a major challenge in language understanding. Web-scale knowledge bases have hundreds or thousands of predicates, and we further expect their predicate vocabularies to grow over time. Annotating even a few examples per predicate is therefore a substantial burden. Systems incorporating environmental context must, for example, learn the name of every object in their environment. Manually annotating a corpus of thousands of semantic parses or object names is expensive. What is needed for these applications are training procedures that use readily available data to train language understanding systems.
In this talk, I present two methods for training language understanding systems using readily available data. The first method uses distant supervision to train a semantic parser from a corpus of unlabeled text and a web-scale knowledge base, eliminating the need for per-sentence semantic annotations. We find that a semantic parser trained in this fashion outperforms a state-of-the-art relation extractor in an information extraction task. The second method trains a semantic parser to understand natural language referencing a situated environment, such as an image, in order to answer queries such as "is the blue mug to the left of the monitor?" We develop a weakly-supervised learning algorithm for this task that has similar performance to a fully-supervised algorithm, while using significantly simpler annotations.
Jayant Krishnamurthy is a Ph.D. student in the Computer Science Department at Carnegie Mellon University. Prior to attending Carnegie Mellon, he received an M.Eng and S.B. from the Massachusetts Institute of Technology. Jayant's research is on machine learning and natural language processing, with a focus on understanding the semantics of natural language. His work is part of the Never-Ending Language Learner (NELL) project at Carnegie Mellon, directed by Tom Mitchell.