Emma Strubell

I'm a Ph.D. student at UMass Amherst working in the Information Extraction and Synthesis Laboratory with Professor Andrew McCallum. Previously, I earned a B.S. in Computer Science from the University of Maine with a minor in math, where I applied models from mathematical biology to the spread of internet worms with Professor David Hiebeler in his Spatial Population Ecological and Epidemiological Dynamics Lab.

Research Interests

I am interested in developing new machine learning techniques to facilitate fast (and accurate) natural language processing of text.

NLP tasks are commonly modeled as structured prediction problems, in which we learn a mapping from the space of possible inputs to an exponentially large space of labels. Since classifying into the exponentially large space of all possible structures is intractable, we need to decompose the structure in a way that allows us to perform more efficient inference.

One way to do this is by defining the variables that describe the structure and a sparse set of dependencies between them, which results in familiar graphical models such as HMMs or CRFs. An alternative approach is to organize the space of possible classes into a structure that allows for more efficient classification, resulting in classification decisions that do not directly correspond to variable boundaries, but some other latent structure in the space of classes, such as a low-rank neural embedding.

I am interested in the latter approach, which I believe will allow for fast inference in conjunction with the accuracy gains from using large, joint output spaces.



In my spare time, I enjoy cooking (with a focus on making vegetables delicious), fermenting (kombucha, saurkraut, kimchi), growing plants (especially succulents), and enjoying the outdoors (hiking and camping).

In search of a fast Scala lexer, I forked JFlex and added the ability to emit Scala code. JFlex-scala, and its corresponding maven and sbt plugins, are available on Maven Central. For an example of its use, check out the tokenizer in FACTORIE.

Patrick Verga and I made an Android app for the NationStates online diplomacy game, which now has almost 10,000 downloads. Some day we will clean up the code and open source it!

I am also co-author of Plant Jones. He is a semi-intelligent plant who tweets negatively about water when he's thirsty, and positively when he's not. His code is available here.

In my junior year of college I wrote and presented a tutorial on quantum algorithms aimed for undergraduate students in computer science, available here, along with slides part 1 and part 2.

I am a proud and happy Gentoo Linux user since 2005.

Amherst, Massachusetts, USA


strubell [at] cs [dot] umass [dot] edu

Resume (PDF)