Towards Universal Analyzers Of Natural Language
How can we enable computers to process natural language in multilingual settings (think airports, social media, etc.)? Thanks to decades of natural language processing (NLP) research, we have been able to develop language analyzers in many languages, including languages with little or no training data. I will review some of the key developments in multilingual NLP research, with an emphasis on syntax.
Despite this progress, the mainstream approach to developing multilingual NLP models (for high-resource languages) has been to independently train one model for each language, which is unsatisfactory for practical as well as theoretical reasons. To that end, I will describe a general framework for training language-universal models, with competitive results in two instantiations of this framework for dependency parsing and language modeling.
Waleed Ammar is finishing up his PhD at CMU and will soon be joining Allen Institute for Artificial Intelligence (AI2). Before the PhD, Waleed was a software engineer at the machine translation group at Microsoft Research. He received two tech transfer awards at Microsoft Research and the Google PhD fellowship in natural language processing.