Fast Accurate And Robust Multilingual Syntactic Analysis
Abstract: To build computer systems that can 'understand' natural language, we need to go beyond bag-of-words models and take the grammatical structure of language into account. Part-of-speech tag sequences and dependency parse trees are one form of such structural analysis that is easy to understand and use. This talk will cover three topics. First, I will present a coarse-to-fine architecture for dependency parsing that uses linear-time vine pruning and structured prediction cascades. The resulting pruned third-order model is twice as fast as an unpruned first-order model and compares favorably to a state-of-the-art transition-based parser in terms of speed and accuracy. I will then present a simple online algorithm for training structured prediction models with extrinsic loss functions. By tuning a parser with a loss function for machine translation reordering, we can show that parsing accuracy matters for downstream application quality, producing improvements of more than 1 BLEU point on an end-to-end machine translation task. Finally, I will present approaches for projecting part-of-speech taggers and syntactic parsers across language boundaries, allowing us to build models for languages with no labeled training data. Our projected models significantly outperform state-of-the-art unsupervised models and constitute a first step towards an universal parser.
This is joint work with Ryan McDonald, Keith Hall, Dipanjan Das, Alexander Rush, Michael Ringgaard and Kuzman Ganchev (a.k.a. the Natural Language Parsing Team at Google).
Bio: Slav Petrov is a Senior Research Scientist in Google's New York office. He works on problems at the intersection of natural language processing and machine learning. He is in particular interested in syntactic parsing and its applications to machine translation and information extraction. He also teaches a class on Statistical Natural Language Processing at New York University every Fall.
Prior to Google, Slav completed his PhD degree at UC Berkeley, where he worked with Dan Klein. He holds a Master's degree from the Free University of Berlin, and also spent a year as an exchange student at Duke University. Slav was a member of the FU-Fighters team that won the RoboCup 2004 world championship in robotic soccer and recently won a best paper award at ACL 2011 for his work on multilingual syntactic analysis.
Slav grew up in Berlin, Germany, but is originally from Sofia, Bulgaria. He therefore considers himself a Berliner from Bulgaria. Whenever Bulgaria plays Germany in soccer, he supports Bulgaria.