Deep Reasoning About Word Embeddings And Linguistic Regularities Using Kernels On Lie Groups
It is estimated that it would take an average adult human more than a decade to read through all the pages of Wikipedia. Exploiting the many advances in computing hardware, optimization methods, programming paradigms and storage technology, it is now possible to process all of Wikipedia (roughly 3.5 billion words) in about half a day on a personal workstation. Recent work over the past few years has shown that recursive deep learning neural nets can learn continuous vector space word representations reflecting the underlying semantics of words from a large corpus like Wikipedia. Simple vector space arithmetic using cosine distances has been shown to capture certain types of analogies, such as reasoning about plurals from singulars, past tense from present tense, family relationships etc.
In this talk, I will present a new approach to reasoning about linguistic regularities from continuous word representations, based on modeling the vector subspaces spanned by groups of related words. We exploit the property that the set of k-dimensional subspaces in an ambient n-dimensional Euclidean space forms a curved matrix manifold called the Grassmannian, and is a quotient subgroup of the Lie group of rotations in n-dimensions. Based on this mathematical model, I will introduce a modified cosine distance based on computing one-parameter exponential flows across the shortest-path geodesics between subspaces representing related word groups. I will show how to learn kernels on Lie groups that capture relation-specific distances across word categories. Testing this approach using learned word embeddings on a 3 billion word Wikipedia corpus using the word analogy tasks studied by Mikolov et al. at Google reveals substantial improvements in performance compared to previous work. I will discuss applications of the matrix manifold framework to other topics in IR and NLP, such as deep QA, ranking, relational knowledge-base completion, and machine translation. If time permits, I will describe the application of new machine learning algorithms based on variational inequalities to various problems in NLP.
BIO: Sridhar Mahadevan is a professor and Director of the Autonomous Learning Laboratory at the School of Computer Science, University of Massachusetts, Amherst. He was elected AAAI Fellow in 2014 for “significant contributions to machine learning”. He is on sabbatical for the year 2014-2015 at IBM Research in New York, where he is working on machine learning methods for the Watson project. He will present a tutorial at AAAI 2015 in January entitled: “Generalizing Optimization to Equilibration: A New Foundation for AI in the 21st century”.