Research
The main goal of my research is
to dramatically increase our ability to mine actionable knowledge from
unstructured text. I am especially interested in information extraction
from the Web, understanding the connections between people and between
organizations, expert finding, social network analysis, and mining the
scientific literature & community. Toward this end my
group
develops and employs various methods in statistical machine learning,
natural language processing, information retrieval and data
mining---tending toward probabilistic approaches and graphical models.
For more information see our current
projects
and
publications.
News
- We are building an "open reviewing" system for ICLR 2013 and other venues. If you are interested in alternative approaches to peer review, please talk with me!
- FACTORIE
is a toolkit for deployable probabilistic
modeling, implemented as a software library in Scala. It provides its
users with a succinct language for creating relational factor graphs,
estimating parameters and performing inference.
- I was the General Chair of ICML 2012, with Program
Chairs Joelle Pineau and John Langford.
- Generalized
Expectation is an accurate way to train models by labeling
features.
- We have publicly launched Rexa,
a new research paper search engine. It is a sibling to CiteSeer and
Google Scholar, except that it provides search and browsing over more
"object types", including not just papers, but also people, grants and
topics.
- Charles Sutton and I have a comprehensive introduction to conditional random
fields now published by Foundations and Trends in Machine Learning.
- I've written an introduction to information extraction by
machine learning, intended for an audience that doesn't know machine
learning. Information Extraction:
Distilling Structured Data from Unstructured Text . Andrew
McCallum. ACM Queue, Volume 3, Number 9, November 2005.
- MALLET is a Java
toolkit for machine learning applied to natural language. It provides
facilities for document classification, information extraction,
part-of-speech tagging, noun phrase segmentation, general finite state
transducers and classification, and much more---all desgined to be
extremely efficient for large data and feature sets. Although quite
mature in functionality, documentation is still sparse.
- An analysis of topical trends in
the five years of ICML before 2008.
- Three of my papers made it into CiteSeer's list of most cited
computer science papers.