I'm a Ph.D. candidate at UMass Amherst working in the Information Extraction and Synthesis Laboratory with Professor Andrew McCallum. Previously, I earned a B.S. in Computer Science from the University of Maine with a minor in math, where I applied models from mathematical biology to the spread of internet worms with Professor David Hiebeler in his Spatial Population Ecological and Epidemiological Dynamics Lab.
I am very grateful to be supported by an IBM Ph.D. Fellowship Award as of fall 2017.
I am interested in developing new machine learning techniques to facilitate fast (and accurate) natural language processing of text.
Techniques for low-level NLP tasks such as part-of-speech tagging, named entity recognition and syntactic dependency parsing are now accurate enough to be of use to practitioners who wish to extract structured information from unstructured text. This can include blog posts and discussion forums on the web, or the text of scientific research papers. Though we now wish to deploy these tools on billions of documents, many of the most accurate models were designed with no regard for computational cost. In response, our work aims to design machine learning algorithms to facilitate fast inference in NLP models while sacrificing as little accuracy as possible.
My research focuses on two avenues for improving the speed-accuracy trade-off: First, we develop models which can quickly build up rich representations of tokens in context used as features in a sequential prediction model, where sequence labeling is performed as a series of independent multi-class classifications. This approach allows for much faster inference than e.g. structured prediction in a graphical model while maintaining accuracy via high-quality feature representations incorporating wide context and a concept of neighboring labels. Second, we unify related NLP tasks into a single end-to-end model which reasons in the joint space of output labels. With this approach we aim to increase accuracy by reducing cascading errors and leveraging shared statistics of co-occurring labels, while at the same time decreasing wall-clock runtime speed by sharing model parameters and computation across tasks.
- Simultaneously Self-attending to All Mentions for Full-Abstract Biological Relation Extraction. Patrick Verga, Emma Strubell and Andrew McCallum. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). New Orleans, Louisiana. June 2018. [bibtex] [code]
- Multi-Task Learning For Parsing The Alexa Meaning Representation Language. Vittorio Perera, Tagyoung Chung, Thomas Kollar and Emma Strubell. Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). New Orleans, Louisiana. February 2018. [bibtex]
- Automatically Extracting Action Graphs From Materials Science Synthesis Procedures. Sheshera Mysore, Edward Kim, Emma Strubell, Ao Liu, Haw-Shiuan Chang, Srikrishna Kompella, Kevin Huang, Andrew McCallum and Elsa Olivetti. NIPS Workshop on Machine Learning for Molecules and Materials (NIPS WS). Long Beach, California. December 2017. Spotlight talk. [bibtex] [poster] [slides]
- Attending to All Mention Pairs for Full Abstract Biological Relation Extraction. Patrick Verga, Emma Strubell, Ofer Shai, and Andrew McCallum. 6th Workshop on Automated Knowledge Base Construction (AKBC). Long Beach, California. December 2017. [bibtex]
- Fast and Accurate Entity Recognition with Iterated Dilated Convolutions. Emma Strubell, Patrick Verga, David Belanger, and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmark. September 2017. [bibtex] [code] [poster]
- Dependency Parsing with Dilated Iterated Graph CNNs. Emma Strubell and Andrew McCallum. 2nd Workshop on Structured Prediction for Natural Language Processing (EMNLP WS). Copenhagen, Denmark. September 2017. [bibtex] [slides]
- Machine-learned and codified synthesis parameters of oxide materials. Edward Kim, Kevin Huang, Alex Tomala, Sara Matthews, Emma Strubell, Adam Saunders, Andrew McCallum and Elsa Olivetti. Nature Scientific Data. 4. 2017. [bibtex]
- An epidemiological model of internet worms with hierarchical dispersal and spatial clustering of hosts. David E. Hiebeler, Andrew Audibert, Emma Strubell and Isaac J. Michaud. Journal of Theoretical Biology. 418: 8--15. 2017. [bibtex]
- Extracting Multilingual Relations under Limited Resources: TAC 2016 Cold-Start KB construction and Slot-Filling using Compositional Universal Schema Haw-Shiuan Chang, Abdurrahman Munir, Ao Liu, Johnny Tian-Zheng Wei, Aaron Traylor, Ajay Nagesh, Nicholas Monath, Patrick Verga, Emma Strubell and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '16 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2016. [bibtex]
- Multilingual Relation Extraction using Compositional Universal Schema. Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth and Andrew McCallum. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). San Diego, California. June 2016. [bibtex] [code]
- Building Knowledge Bases with Universal Schema: Cold Start and Slot-Filling Approaches Benjamin Roth, Nicholas Monath, David Belanger, Emma Strubell, Patrick Verga and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '15 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2015. [bibtex]
- Learning Dynamic Feature Selection for Fast Sequential Prediction. Emma Strubell, Luke Vilnis, Kate Silverstein and Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015. Outstanding paper award. [video] [slides] [poster] [bibtex]
- Training for Fast Sequential Prediction Using Dynamic Feature Selection. Emma Strubell, Luke Vilnis, and Andrew McCallum. NIPS Workshop on Modern Machine Learning and NLP (NIPS WS). Montreal, Quebec, Canada. December 2014. [bibtex]
- Minimally Supervised Event Argument Extraction using Universal Schema. Benjamin Roth, Emma Strubell, Katherine Silverstein and Andrew McCallum. 4th Workshop on Automated Knowledge Base Construction (AKBC). At NIPS '14, Montreal, Quebec, Canada. December 2014. [bibtex]
- Universal Schema for Slot-Filling, Cold-Start KBP and Event Argument Extraction: UMassIESL at TAC KBP 2014. Benjamin Roth, Emma Strubell, John Sullivan, Lakshmi Vikraman, Katherine Silverstein, and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '14 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2014. [bibtex]
- Modeling the Spread of Biologically-Inspired Internet Worms. Emma Strubell. Undergraduate honors thesis. University of Maine Honors College, Orono, Maine, USA. May 2012. [bibtex]
In my spare time, I enjoy cooking (with a focus on making vegetables delicious), fermenting (kombucha, kimchi, yogurt), growing plants (especially succulents), and enjoying the outdoors (backpacking and rock climbing).
In search of a fast Scala lexer, I forked JFlex and added the ability to emit Scala code. JFlex-scala, and its corresponding maven and sbt plugins, are available on Maven Central. For an example of its use, check out the tokenizer in FACTORIE.
Gentoo Linux user since 2005.
Amherst, Massachusetts, USA
strubell [at] cs [dot] umass [dot] edu