Emma Strubell

I'm a Ph.D. candidate at UMass Amherst working in the Information Extraction and Synthesis Laboratory with Professor Andrew McCallum. Previously, I earned a B.S. in Computer Science from the University of Maine with a minor in math, where I applied models from mathematical biology to the spread of internet worms with Professor David Hiebeler in his Spatial Population Ecological and Epidemiological Dynamics Lab.

In summer 2016 I interned as a Research Scientist with Tom Kollar on the Alexa NLU team at Amazon Lab126. In 2017 I interned with Daniel Andor and David Weiss at Google Research (SAFT) in NYC.

I am very grateful to be supported by an IBM Ph.D. Fellowship Award as of fall 2017.

Research Interests

I am interested in developing new machine learning techniques to facilitate fast (and accurate) natural language processing of text.

NLP tasks are commonly modeled as structured prediction problems, in which we learn a mapping from the space of possible inputs to an exponentially large space of labels. Since classifying into the exponentially large space of all possible structures is intractable, we need to decompose the structure in a way that allows us to perform more efficient inference.

One way to do this is by defining the variables that describe the structure and a sparse set of dependencies between them, which results in familiar graphical models such as HMMs or CRFs. An alternative approach is to organize the space of possible classes into a structure that allows for more efficient classification, resulting in classification decisions that do not directly correspond to variable boundaries, but some other latent structure in the space of classes, such as a low-rank neural embedding.

I am interested in the latter approach, which I believe will allow for fast inference in conjunction with the accuracy gains from using large, joint output spaces.



In my spare time, I enjoy cooking (with a focus on making vegetables delicious), fermenting (kombucha, saurkraut, kimchi), growing plants (especially succulents), and enjoying the outdoors (hiking, camping and rock climbing).

In search of a fast Scala lexer, I forked JFlex and added the ability to emit Scala code. JFlex-scala, and its corresponding maven and sbt plugins, are available on Maven Central. For an example of its use, check out the tokenizer in FACTORIE.

I am also co-author of Plant Jones. He is a semi-intelligent plant who tweets negatively about water when he's thirsty, and positively when he's not. His code is available here.

In my junior year of college I wrote and presented a tutorial on quantum algorithms aimed for undergraduate students in computer science, available here, along with slides part 1 and part 2.

I am a proud and happy Gentoo Linux user since 2005.

Amherst, Massachusetts, USA


strubell [at] cs [dot] umass [dot] edu