I'm a Ph.D. candidate at UMass Amherst working in the Information Extraction and Synthesis Laboratory with Professor Andrew McCallum. Previously, I earned a B.S. in Computer Science from the University of Maine with a minor in math, where I applied models from mathematical biology to the spread of internet worms with Professor David Hiebeler in his Spatial Population Ecological and Epidemiological Dynamics Lab.
I am very grateful to be supported by an IBM Ph.D. Fellowship Award as of fall 2017.
I am interested in developing new machine learning techniques to facilitate fast (and accurate) natural language processing of text.
NLP tasks are commonly modeled as structured prediction problems, in which we learn a mapping from the space of possible inputs to an exponentially large space of labels. Since classifying into the exponentially large space of all possible structures is intractable, we need to decompose the structure in a way that allows us to perform more efficient inference.
One way to do this is by defining the variables that describe the structure and a sparse set of dependencies between them, which results in familiar graphical models such as HMMs or CRFs. An alternative approach is to organize the space of possible classes into a structure that allows for more efficient classification, resulting in classification decisions that do not directly correspond to variable boundaries, but some other latent structure in the space of classes, such as a low-rank neural embedding.
I am interested in the latter approach, which I believe will allow for fast inference in conjunction with the accuracy gains from using large, joint output spaces.
- Fast and Accurate Entity Recognition with Iterated Dilated Convolutions. Emma Strubell, Patrick Verga, David Belanger, and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmark. September 2017. [bibtex] [code]
- Dependency Parsing with Dilated Iterated Graph CNNs. Emma Strubell and Andrew McCallum. 2nd Workshop on Structured Prediction for Natural Language Processing (EMNLP WS). Copenhagen, Denmark. September 2017. [bibtex]
- Machine-learned and codified synthesis parameters of oxide materials. Edward Kim, Kevin Huang, Alex Tomala, Sara Matthews, Emma Strubell, Adam Saunders, Andrew McCallum and Elsa Olivetti. Nature Scientific Data. 4. 2017. [bibtex]
- An epidemiological model of internet worms with hierarchical dispersal and spatial clustering of hosts. David E. Hiebeler, Andrew Audibert, Emma Strubell and Isaac J. Michaud. Journal of Theoretical Biology. 418: 8--15. 2017. [bibtex]
- Extracting Multilingual Relations under Limited Resources: TAC 2016 Cold-Start KB construction and Slot-Filling using Compositional Universal Schema Haw-Shiuan Chang, Abdurrahman Munir, Ao Liu, Johnny Tian-Zheng Wei, Aaron Traylor, Ajay Nagesh, Nicholas Monath, Patrick Verga, Emma Strubell and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '16 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2016. [bibtex]
- Multilingual Relation Extraction using Compositional Universal Schema. Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth and Andrew McCallum. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). San Diego, California. June 2016. [bibtex] [code]
- Building Knowledge Bases with Universal Schema: Cold Start and Slot-Filling Approaches Benjamin Roth, Nicholas Monath, David Belanger, Emma Strubell, Patrick Verga and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '15 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2015. [bibtex]
- Learning Dynamic Feature Selection for Fast Sequential Prediction. Emma Strubell, Luke Vilnis, Kate Silverstein and Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015. Outstanding paper award. [video] [slides] [poster] [bibtex]
- Training for Fast Sequential Prediction Using Dynamic Feature Selection. Emma Strubell, Luke Vilnis, and Andrew McCallum. NIPS Workshop on Modern Machine Learning and NLP (NIPS WS). Montreal, Quebec, Canada. December 2014. [bibtex]
- Minimally Supervised Event Argument Extraction using Universal Schema. Benjamin Roth, Emma Strubell, Katherine Silverstein and Andrew McCallum. 4th Workshop on Automated Knowledge Base Construction (AKBC). At NIPS '14, Montreal, Quebec, Canada. December 2014. [bibtex]
- Universal Schema for Slot-Filling, Cold-Start KBP and Event Argument Extraction: UMassIESL at TAC KBP 2014. Benjamin Roth, Emma Strubell, John Sullivan, Lakshmi Vikraman, Katherine Silverstein, and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '14 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2014. [bibtex]
- Modeling the Spread of Biologically-Inspired Internet Worms. Emma Strubell. Undergraduate honors thesis. University of Maine Honors College, Orono, Maine, USA. May 2012. [bibtex]
In my spare time, I enjoy cooking (with a focus on making vegetables delicious), fermenting (kombucha, saurkraut, kimchi), growing plants (especially succulents), and enjoying the outdoors (hiking, camping and rock climbing).
In search of a fast Scala lexer, I forked JFlex and added the ability to emit Scala code. JFlex-scala, and its corresponding maven and sbt plugins, are available on Maven Central. For an example of its use, check out the tokenizer in FACTORIE.
I am a proud and happy Gentoo Linux user since 2005.
Amherst, Massachusetts, USA
strubell [at] cs [dot] umass [dot] edu