
Publications
2015
 Structured Prediction Energy Networks. David Belanger, Andrew McCallum. ArXiv preprint, submitted to ICLR, 2016.
 Multilingual Relation Extraction using Compositional Universal Schema.
Pat Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew
McCallum. ArXiv preprint, submitted to ICLR, 2016.
 Word Representations via Gaussian Embedding. Luke Vilnis, Andrew McCallum. International Conference on Learning Representations (ICLR) oral presentation, 2015.
 Compositional Vector Space Models for Knowledge Base Inference. Arvind Neelakantan, Benjamin Roth, Andrew McCallum. AAAI Spring Symposium Series (AAAISS), 2015.
 Bethe Projections for NonLocal Inference. Luke Vilnis, David Belanger, Dan Sheldon, Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI) 2015.
 Learning Dynamic Feature Selection for Fast Sequential Prediction.
Emma Strubell, Luke Vilnis, Kate Silverstein and Andrew McCallum.
Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015. Outstanding paper award.
 Compositional Vector Space Models for Knowledge Base Completion. Arvind Neelakantan, Benjamin Roth and Andrew McCallum.
Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015.
2014
 Training for Fast Sequential Prediction Using Dynamic Feature Selection. Emma Strubell, Luke Vilnis, and Andrew McCallum. NIPS Workshop on Modern Machine Learning and NLP (NIPS WS). Montreal, Quebec, Canada. December 2014.
 Knowledge Base Completion using Compositional Vector Space Models. Arvind Neelakantan, Benjamin Roth and Andrew McCallum. In 4th Workshop on Automated Knowledge Base Construction (AKBC) 2014 at NIPS. Outstanding Paper Award.
 Minimally Supervised Event Argument Extraction using Universal Schema.
Benjamin Roth, Emma Strubell, Katherine Silverstein and Andrew
McCallum. In 4th Workshop on Automated Knowledge Base Construction (AKBC) at NIPS, Montreal, Quebec, Canada. December 2014.
 Universal Schema for SlotFilling, ColdStart KBP and Event Argument Extraction: UMass IESL at TAC KBP 2014.
Benjamin Roth, Emma Strubell, John Sullivan, Lakshmi Vikraman,
Katherine Silverstein, and Andrew McCallum. Text Analysis Conference
(Knowledge Base Population Track) '14 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2014.
 Efficient Nonparametric Estimation of Multiple Embeddings per Word in Vector Space. Arvind Neelakantan, Jeevan Shankar, Alexandre Passos and Andrew McCallum. Conference on Empirical
Methods in Natural Language Processing and Natural Language Learning (EMNLP), 2014.
 A Hierarchical Model for Universal Schema Relation Extraction.
Arvind Neelakantan, Alexandre Passos, Andrew McCallum. Workshop
on Automatic Creation and Curation of Knowledge Bases (WACCK) at SIGMOD, 2014.
 Message Passing for Soft Constraint Dual Decomposition. David Belanger, Alexandre Passos, Sebastian Riedel, Andrew McCallum. Uncertainty in Artificial Intelligence (UAI), 2014.
 Lexicon Infused Phrase Embeddings for Named Entity Resolution. Alexandre Passos, Vineet Kumar, Andrew McCallum. Conference on Computational Natural Language Learning (CoNLL), 2014.
 Learning Soft Linear Constraints with Application to Citation Field Extraction. Sam Anzaroot, Alexandre Passos, David Belanger, Andrew McCallum. Proceedings of the
Association for Computational Linguistics (ACL), 2014.
2013
 Optimization and Learning in FACTORIE. Alexandre Passos, Luke Vilnis,
Andrew McCallum. Neural Information Processing Systems Workshop on
Optimization for Machine Learning (NIPS WS), 2013.
 Marginal Inference in MRFs using FrankWolfe. David Belanger, Dan Sheldon, Andrew McCallum. Neural Information
Processing Systems Workshop on Greedy Optimization, FrankWolfe and Friends (NIPS WS), 2013.
 Anytime Belief Propagation Using Sparse Domains.
Sameer Singh, Sebastian Riedel, Andrew McCallum. Neural
Information Processing Systems Workshop on ResourceEfficient Machine
Learning (NIPS WS), 2013.
 Universal Schema for Slot Filling and Cold Start: UMass IESL at TACKBP.
Sameer Singh, David Belanger, Ari Kobren, Michael Wick, Alexandre
Passos, Harshal Pandya, Jinho Choi, Brian Martin, Andrew
McCallum. Text Analysis Conference (TAC), 2013.
 Universal Schema for Entity Type Prediction.
Limin Yao, Sebastian Reidel, Andrew McCallum. Third International
Workshop on Automated Knowledge Base Construction (AKBC), 2013.
 A Joint Model for Discovering and Linking Entities.
Michael Wick, Sameer Singh, Harshal Pandya, Andrew McCallum.
Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013.
 Assessing Confidence of Knowledge Base Content with an Experimental Study in Entity Resolution.
Michael Wick, Sameer Singh, Ari Kobren, Andrew McCallum. Third
International Workshop on Automated Knowledge Base Construction (AKBC), 2013.
 Joint Inference of Entities, Relations, and Coreference.
Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, Andrew
McCallum. Third International Workshop on Automated Knowledge
Base Construction (AKBC), 2013.
 Dynamic Knowledge Base Alignment for Coreference Resolution.
Jiaping Zheng, Luke Vilnis, Sameer Singh, Jinho Choi, Andrew
McCallum. Seventeenth Conference on Computational Natural
Language Learning (CoNLL), 2013.
 Transitionbased Dependency Parsing with Selectional Branching. Jinho D. Choi, Andrew McCallum, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 2013.
 Open Scholarship and Peer Review: a Time for Experimentation. David Soergel, Adam Saunders, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
 A New Dataset for FineGrained Citation Field Extraction. Sam Anzaroot, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
 Largescale Author Coreference via Hierarchical Entity Representations. Michael L Wick, Ari Kobren, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
 Wikilinks: A Largescale CrossDocument Coreference Corpus Labeled via Links to Wikipedia. Sameer Singh, Amar Subramanya, Fernando Pereira, Andrew McCallum. Technical Report (TR) UMASSCS2012015, October, 2012.
 Relation Extraction with Matrix Factorization and Universal Schemas.
Sebastian Riedel, Limin Yao, Benjamin M. Marlin and Andrew McCallum,
Joint Human Language Technology Conference/Annual Meeting of the North
American Chapter of the Association for Computational Linguistics (HLTNAACL), 2013.
 Latent Relation Representations for Universal Schemas. Sebastian Riedel, Limin Yao, Andrew McCallum. International Conference on Learning Representations (ICLR), 2013.
2012
 MAP Inference in Chains using Column Generation.
David Bellanger, Alexandre Passos, Sebastian Riedel, Andrew
McCallum. Proceedings of Neural Information Processing (NIPS), 2012.
 Probabilistic Databases of Universal Schema. Limin Yao, Sebastian Riedel and Andrew McCallum, NAACL Workshop on Automatic Knowledge Base Construction (AKBC), 2012.
 Human Machine Cooperation with Epistemological DBs: Supporting User Corrections to Automatically Constructed KBs. Michael Wick, Karl Schultz, and Andrew McCallum. NAACL Workshop on Automatic Knowledge Base Construction (AKBC) 2012. (Best paper runnerup)
 Monte Carlo MCMC: Efficient Inference by Sampling Factors. Sameer Singh, Michael Wick, and Andrew McCallum. NAACL Workshop on Automatic Knowledge Base Construction (AKBC) 2012.
 Monte Carlo MCMC: Efficient Inference by Approximate Sampling.
Sameer Singh, Michael Wick, Andrew McCallum. Conference on Empirical
Methods in Natural Language Processing and Natural Language Learning (EMNLP), 2012.
 Combining joint models for biomedical event extraction. David McClosky, Sebastian Riedel, Minhai Surdeanu, Andrew McCallum, Christopher Manning. BMC Bioinformatics, 2012.
 Speeding up MAP with Column Generation and Block Regularization.
David Belanger, Alexandre Passos, Sebastian Riedel and Andrew McCallum,
ICML Workshop on Inferning: Interactions between Inference and
Learning, (ICML WS), 2012.
 Parse, Price and Cut  Delayed Column and Row Generation for Graph Based Parsers.
Sebastian Riedel, David A. Smith and Andrew McCallum, Proceedings of
the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.
 A Discriminative Hierarchical Model for Fast Coreference at Large Scale. Michael Wick, Sameer Singh, Andrew McCallum. Association for Computational Linguistics (ACL), 2012.
 Unsupervised Relation Discovery with Sense Disambiguation.
Limin Yao, Sebastian Riedel and Andrew McCallum. Proceedings of the
50th Annual Meeting of the Association for Computational Linguistics (ACL), 2012.
 Topic Models for Taxonomies. Anton Bakalov, Andrew McCallum, Hanna Wallach and David Mimno. Proceedings of the Joint Conference on Digital Libraries (JCDL), 2012.
 Selecting Actions for Resourcebounded Information Extraction using Reinforcement Learning. Pallika Kanani, Andrew McCallum. Web Search and Data Mining (WSDM), 2012.
2011
 Correlations and anticorrelations in LDA inference.
Alexandre Passos, Hanna Wallach, Andrew McCallum. Neural Information
Processing Systems Workshop on Challenges in Learning Hierarchical
Models: Transfer Learning and Optimization (NIPS WS), 2011.
 Inducing Value Sparsity for Parallel Inference in Treeshaped Models.
Sameer Singh, Brian Martin, Andrew McCallum. Neural Information
Processing Systems Workshop on Computational Tradeoffs in Statistical
Learning (NIPS WS), 2011.
 Towards Asynchronous Distributed MCMC Inference for Large Graphical Models.
Sameer Singh, Andrew McCallum. Neural Information Processing
Systems Workshop on Algorithms, Systems, and Tools for Learning at
Scale (NIPS WS), 2011.
 Query Aware McMC. Michael Wick and Andrew McCallum. Proceedings of Neural Information Processing Systems (NIPS), 2011.
 Toward Interactive Training and Evaluation. Greg Druck and Andrew McCallum. Conference on Information and Knowledge Mangement (CIKM), 2011.
 Model Combination for Event Extraction in BioNLP.
Sebastian Riedel, David McClosky, Mihai Surdeanu, Christopher D.
Manning and Andrew McCallum. Proceedings of the Natural Language
Processing in Biomedicine NAACL 2011 Workshop (BioNLP), 2011.
 Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation.
Sebastian Riedel and Andrew McCallum. Proceedings of the Natural
Language Processing in Biomedicine NAACL 2011 Workshop (BioNLP), 2011.
 InterEvent Dependencies support Event Extraction from Biomedical Literature.
Roman Klinger, Sebastian Riedel and Andrew McCallum. Mining Complex
Entities from Network and Biomedical Data (MIND), Proceedings of the
European Conference on Machine Learning and Knowledge Discovery in
Databases (ECML PKDD), 2011.
 Structured Relation Discovery using Generative Models. Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011.
 Fast and Robust Joint Models for Biomedical Event Extraction. Sebastian Riedel, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011.
 Optimizing Semantic Coherence in Topic Models.
David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, Andrew
McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011.
 SampleRank: Training Factor Graphs with Atomic Gradients.
Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta,
Andrew McCallum. Proceedings of the International Conference on Machine
Learning (ICML), 2011.
 Database of NIH grants using machinelearned categories and graphical clustering.
Edmund M Talley, David Newman, David Mimno, Bruce W Herr II, Hanna M
Wallach, Gully Burns, Miriam Leenders, Andrew McCallum. Nature Methods, 8, 443–444, 27 May 2011.
 LargeScale CrossDocument Coreference Using Distributed Inference and Hierarchical Models. Sameer Singh, Amarnag Subramanya, Fernando
Pereira, Andrew McCallum. Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011
2010
 An Introduction to Conditional Random Fields.
Charles Sutton, Andrew McCallum. Foundations and Trends in Machine
Learning (FnT ML), to appear.
 Distantly labeling data for large scale crossdocument coreference. Sameer Singh, Michael Wick, Andrew McCallum. Technical report on arXiv (TR), 2010.
 Distributed MAP Inference for Undirected
Graphical Models. Sameer Singh, Amarnag Subramanya, Fernando
Pereira, Andrew McCallum. Neural Information Processing Systems
Workshop on Learning on Cores, Clusters, and Clouds (NIPS WS),
2010.
 Machine Translation Using Overlapping Alignments and SampleRank. Benjamin
Roth, Andrew McCallum, Marc Dymetman and Nicola Cancedda. Proceedings
of the Ninth Conference of the Association for Machine Translation in
the Americas (AMTA), 2010.
 HighPerformance
SemiSupervised Learning using Discriminatively Constrained Generative
Models. Gregory Druck, Andrew McCallum. International Conference on
Machine Learning (ICML), 2010.
 ConstraintDriven
RankBased Learning for Information Extraction Sameer Singh, Limin
Yao, Sebastian Riedel, Andrew McCallum. Conference of the North
American Chapter of the Association for Computational Linguistics (NAACL
HLT),
 Collective
CrossDocument Relation Extraction Without Labelled Data. Limin
Yao, Sebastian Riedel, Andrew McCallum. Proceedings of Empirical
Methods in Natural Language Processing (EMNLP), 2010.
 Modeling Relations and Their Mentions without
Labeled Text. Sebastian Riedel, Limin Yao, Andrew McCallum.
Proceedings of the European Conference on Machine Learning (ECML/PKDD),
2010.
 Resourcebounded
Information Extraction: Acquiring Missing Feature Values On Demand.
Pallika H. Kanani, Andrew McCallum, Shaohan Hu. Proceedings of the 14th
PA Conference on Knowledge Discovery and Data Mining (PAKDD),
2010. (Best paper runnerup.)
 Scalable
Probabilistic Databases with Factor Graphs and MCMC. Michael Wick,
Andrew McCallum, Gerome Miklau. Proceedings of the International
Conference on Very Large Databases (VLDB), 2010.
2009
 FACTORIE:
Probabilistic Programming via Imperatively Defined Factor Graphs.
Andrew McCallum, Karl Schultz, Sameer Singh. Neural Information
Processing Systems (NIPS), 2009.
 Rethinking
LDA: Why Priors Matter. Hanna Wallach, David Mimno, Andrew
McCallum. Neural Information Processing Systems (NIPS),
2009.
 Training
Factor Graphs with Reinforcement Learning for Efficient MAP Inference..
Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum.
Neural Information Processing Systems (NIPS), 2009.
 SampleRank:
Learning Preferences from Atomic Gradients. Michael Wick, Khashayar
Rohanimanesh, Aron Culotta, Andrew McCallum. Neural Information
Processing Systems Workshop on Advances in Ranking (NIPS WS),
2009.
 Bidirectional Joint Inference for Entity Resolution and
Segmentation using ImperativelyDefined Factor Graphs. Sameer
Singh, Karl Schultz, Andrew McCallum. European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases (ECML
PKDD), 2009.
 Efficient
Methods for Topic Model Inference on Streaming Document Collections.
Limin Yao, David Mimno and Andrew McCallum. Conference on Knowledge
Discovery and Data Mining (KDD), 2009, Paris, France.
 Generalized
Expectation Criteria for Bootstrapping Extractors using RecordText
Alignment. Kedar Bellare and Andrew McCallum. Proceedings of
Empirical Methods in Natural Language Processing (EMNLP) 2009,
Singapore (EMNLP), 2009
 Polylingual
Topic Models.
David Mimno, Hanna Wallach, Jason Naradowsky, David Smith and Andrew
McCallum. Proceedings of the Empirical Methods in Natural Language
Processing (EMNLP), Singapore, 2009.
 Active
Learning by Labeling Features. Gregory Druck, Burr Settles, Andrew
McCallum. Proceedings of the Empirical Methods in Natural Language
Processing (EMNLP).
 Inference
and Learning in Large Factor Graphs with Adaptive Proposal Distributions.
Khashayar Rohanimanesh, Michael Wick, Andrew McCallum. University of
Massachusetts Technical Report #UMCS2009008 (TR), 2009
 Advances
in Learning and Inference for Partitionwise Models of Coreference
Resolution. Michael Wick and Andrew McCallum. University of
Massachusets Technical Report # UMCS2009028 (TR), 2009
 Representing
Uncertainty in Databases with Scalable Factor Graphs. Michael Wick,
Masters Thesis/Synthesis. Readers: Andrew McCallum and Gerome Miklau.
April 2009
 An
Entity Based Model for Coreference Resolution.
Michael Wick, Aron Culotta, Khashayar Rohanimanesh, Andrew McCallum.
Proceedings of the SIAM International Conference on Data Mining (SDM),
Reno, Nevada, 2009
 Alternating
Projections for Learning with Expectation Constraints. Kedar
Bellare, Gregory Druck and Andrew McCallum. Uncertainty in Artificial
Intelligence (UAI), 2009
 Semisupervised
Learning of Dependency Parsers using Generalized Expectation Criteria.
Gregory Druck, Gideon Mann, Andrew McCallum. Proceedings of the
Association for Computational Linguistics (ACL).
 Towards Theoretical Bounds for Resourcebounded Information Gathering for Correlation Clustering. Pallika Kanani, Andrew McCallum, Ramesh Sitaraman. UMass TechReport UMCS2009027 (TR), 2009.
 Generalized Expectation Criteria with application to
SemiSupervised Classification and Sequence Modeling. Gideon Mann
and Andrew McCallum. Journal of Machine Learning Research (JMLR).
To appear.
2008
 Reinforcement
Learning for MAP Inference in Large Factor Graphs.
Khashayar Rohanimanesh, Michael Wick, Sameer Singh, and Andrew
McCallum. University of Massachusetts Technical Report #UMCS2008040 (TR),
2008
 Gibbs
Sampling for Logistic Normal Topic Models with GraphBased Priors.
David Mimno, Hanna Wallach and Andrew McCallum. NIPS Workshop on
Analyzing Graphs, (NIPS WS), 2008, Whistler, BC.
 FACTORIE:
Efficient Probabilistic Programming for Relational Factor Graphs via
Imperative Declarations of Structure, Inference and Learning.
Andrew McCallum, Khashayar Rohanemanesh, Michael Wick, Karl Schultz,
Sameer Singh. NIPS Workshop on Probabilistic Programming, (NIPS
WS), 2008. (Discriminatively
trained undirected graphical models, or conditional random fields, have
had wide empirical success, and there has been increasing interest in
toolkits that ease their application to complex relational data.
Although there has been much historic interest in the combination of
logic and probability, we argue that in this mixture 'logic' is largely
a red herring. The power in relational models is in their repeated
structure and tied parameters; and logic is not necessarily the best
way to define these structures. Rather than using a declarative
language, such as SQL or firstorder logic, we advocate using an
objectoriented imperative language to express various aspects of model
structure, inference and learning. By combining the traditional,
declarative, statistical semantics of factor graphs with imperative
definitions of their construction and operation, we allow the user to
mix declarative and procedural domain knowledge, and also gain
significant efficiencies. We have implemented our ideas in a system we
call FACTORIE, a software library for an objectoriented,
stronglytyped, functional JVM language named Scala.)
 A Discriminative
Approach to Ontology Alignment.
Michael Wick, Khashayar Rohanimanesh, Andrew McCallum, and AnHai Doan.
In the International Workshop on New Trends in Information Integration
(NTII) at the conference for Very Large Databases (VLDB WS),
Auckland, New Zealand, 2008. (New
stateoftheart results on ontology alignment using graphshaped
conditional random fields, joint inference, and parameter estimation by
RankBased Training.)
 A Unified Approach for
Schema Matching, Coreference, and Canonicalization. Michael Wick,
Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum. In Conference on
Knowledge Discovery and Data Mining (KDD). 2008. (Information
integration, performing joint inference over schema matching, entity
resolution and canonicalization, using conditional random fields,
features encoding clauses in firstorder logic, and efficient inference
by MetropolisHastings. Positive experimental results on multiple data
sets.)
 Unsupervised Deduplication
using Crossfield Dependencies. Robert Hall, Charles Sutton, Andrew
McCallum. In Conference on Knowledge Discovery and Data Mining (KDD).
2008. (Hierarchical
Dirichlet process model that jointly clusters citation venue strings
based on both stringedit distance and title information.)
 Bayesian Modeling of
Dependency Trees Using Hierarchical PitmanYor Priors.
Hanna Wallach, Charles Sutton, Andrew McCallum. In International
Conference on Machine Learning, Workshop on Prior Knowledge for Text
and Language Processing. (ICML WS), 2008. (Two
Bayesian dependency parsing models: 1. Model with PitmanYor prior that
significantly improves Eisner's classic model; 2. Latentvariable model
that learns "syntactic" topics.)
 Learning from Labeled
Features using Generalized Expectation Criteria. Gregory Druck,
Gideon Mann and Andrew McCallum. Proceedings of ACM Special Interest
Group on Information Retreival, (SIGIR), 2008. (Learn
classifiers by labeling features rather than instances. Extensive
evaluation on many text data sets, showing substantial improvement over
other methods of semisupervised learning.)
 Learning to Predict
the Quality of Contributions to Wikipedia. Gregory Druck, Gerome
Miklau and Andrew McCallum. AAAI Workshop on Wikipedia and AI, (AAAI
WS), 2008. (Predict
the longevity of an edit to Wikipedia, using textual features of the
edit as well as features of the editor. Could be part of a tool to
prioritize verification of changes to Wikipedia.)
 Topic Models Conditioned on
Arbitrary Features with Dirichletmultinomial Regression. David
Mimno and Andrew McCallum. (Plenary presentation.) Conference on
Uncertainty in Artificial Intelligence, (UAI), 2008. (Text
documents are usually accompanied by metadata, such as the authors, the
publication venue, the date, and any references. Work in topic modeling
that has taken such information into account, such as AuthorTopic,
CitationTopic, and TopicoverTime models, has generally focused on
constructing specific models that are suited only for one particular
type of metadata. This paper presents a simple, unified model for
learning topics from documents given arbitrary nontextual features,
which can be discrete, categorical, or continuous.)
 Generalized Expectation
Criteria for SemiSupervised Learning of Conditional Random Fields.
Gideon Mann and Andrew McCallum. Proceedings of Association of
Computational Linguistics, (ACL), 2008. (Generalized expectation for semisupervised learning
of linearchain conditional random fields.)
 Piecewise Training for
Structured Prediction. Charles Sutton and Andrew McCallum. Accepted
to the Machine Learning Journal, (MLJ), 2008. (Efficiently train CRFs in parts. It works well even
though full joint inference is used at test time.)
 Pachinko Allocation:
Scalable Mixture Models of Topic Correlations. Wei Li and Andrew
McCallum. Submitted to the Journal of Machine Learning Research, (JMLR),
2008. (The
pachinko allocation model represents nested correlations among topics
using a DAG. This paper has work is in efficiently fitting these
models, (as well as plain old LDA) by creating and leveraging sparsity
in the distribution over topics to be sampled for each document.)
2007
 Unsupervised
Coreference of Publication Venues . Robert Hall, Charles Sutton
and Andrew McCallum. University of Massachusetts Amherst Technical
Report, (TR), 2007. (A
generative nonparametric mixture model for entity resolution of
publication venues that leverages both the venue titles as well as
distributions over words in paper titles.)
 Generalized Expectation
Criteria. Andrew McCallum, Gideon Mann and Gregory Druck.
University of Massachusetts Amherst Technical Report #200760, (TR),
2007. (This note introduces and motivates Generalized
Expectation
(GE) criteria. GE criteria are terms in a parameterestimation
objective function that express preferences about model expectations.
In certain simple cases, GE falls into the same equivalence class as
moment matching, maximum likelihood and maximum entropy estimation.
However, our work focusses on leveraging GE's special flexibility in
three nontraditional ways: (1) GE criteria can be specified indepently
of the model parameterization. In factor graphs, we break the
traditional onetoone mapping between (a) subsets of variables
participating in parametered model factors and (b) subsets of variables
over which the objective function's expectations are calculated. (2)
Within the same objective function, multiple GE terms that are
conditional expectations can be conditioned on multiple different data
sets. This is useful for semisupervised learning and transfer
learning. (3) A target expectation (or more generally the expectation
preference function can come from any source, including other tasks or
human domain knowledge. GE is the successor to Expectation
Regularization, which is described in our ICML 2007 paper below.)
 Reducing Annotation
Effort using Generalized Expectation CriteriaDRAFT. Gregory
Druck, Gideon Mann and Andrew McCallum. University of Massachusetts
Amherst Technical Report #200762, (TR), 2007. (A version of Generalized Expectation (GE) in which the
supervision is provided by labeling features instead of instances.
Dramatically faster wallclock labeling to acheive high accuracy.
Experiments on document classification.)
 Communitybased Link
Prediction with Text.
David Mimno, Hanna M. Wallach and Andrew McCallum. In Proceedings of
the NIPS 2007 Workshop on Statistical Network Modeling (NIPS WS), 2007.
(New stateoftheart results in
linkprediction
using a latentvariable topic model, in which "community" variables are
associated with topic distributions and author distributions. Thus the
model combines the use of language/topics and coauthorships to
discover communities.)
 Leveraging
Existing Resources using Generalized Expectation Criteria. Gregory
Druck, Gideon Mann and Andrew McCallum. NIPS Workshop on Learning
Problem Design, (NIPS WS), 2007. (Generalized
Expectation applied in situations in which there is no labeled data.
All supervision is obtained form existing auxiliary resources such as
lexicons. Experiments on information extraction.)
 LightlySupervised
Attribute Extraction for Web Search.
Kedar Bellare, Partha Pratim Talukdar, Giridhar Kumaran, Fernando
Pereira, Mark Liberman, Andrew McCallum and Mark Dredze. NIPS Workshop
on Machine Learning for Web Search, (NIPS WS), 2007. (Extract
a large number of attributes of different entities from natural
language text. Methods based on cotraining and maximum entropy
classifiers.)
 PeopleLDA:
Anchoring Topics to People Using Face Recognition. Vidit Jain, Erik
LearnedMiller, and Andrew McCallum. International Conference on
Computer Vision (ICCV), 2007. (Jointly
model people's identity, face appearance in an image, and surrounding
text in the image captions with an LDAstyle topic model. Improved
results in identifying coherent sets of person "mentions"that is,
improved coreference by using both text and image features.)
 Joint Group and Topic
Discovery from Relations and Text.
Andrew McCallum, Xuerui Wang and Natasha Mohanty, Statistical Network
Analysis: Models, Issues and New Directions, Lecture Notes in Computer
Science 4503, pp. 2844, (Book chapter), 2007. (Book
chapter version of NIPS 2006 conference paper. Social network analysis
that simultaneously discovers groups of entities and also clusters
attributes of their relations, such that clustering in each dimension
in forms the other. Applied to the voting records and corresponding
text of resolutions from the U.S. Senate and the U.N., showing that
incorporating the votes results in more salient topic clusters, and
that different groupings of legislators emerge from different topics.)
 Topical Ngrams: Phrase
and Topic Discovery, with an Application to Information Retrieval.
Xuerui Wang, Andrew McCallum and Xing Wei, Proceedings of the 7th IEEE
International Conference on Data Mining (ICDM), 2007. (A topic model in the LDA style that uses a Markov
model to automatically discover topicallyrelevant arbitrarylength phrases,
not just lists of single words. The phrase discovery is not simply a
postprocessing step, but an intrinsic part of the model that helps it
discover better topics. Experiments on document retrieval tasks.)
 Canonicalization of
Database Records using Adaptive Similarity Measures.
Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli and Andrew
McCallum. Conference on Knowledge Discovery and Data Mining (KDD),
2007. (Defines
and explores the problem of "canonicalization"selecting the best
field values for a single, standard record formed from a set of
consolodated, coresolved information sources, such as arise from
merging databases, or combining multiple sources of information
extraction.)
 Generalized Component
Analysis for Text with Heterogeneous Attributes. Xuerui Wang, Chris
Pal and Andrew McCallum. Conference on Knowledge Discovery and Data
Mining (KDD), 2007. (A topic
model based on an undirected graphical model, which makes it easier to
incorporate multiple modalities.)
 SemiSupervised
Classification with Hybrid Generative/Discriminative Methods.
Greg Druck, Chris Pal, Xiaojin Zhu and Andrew McCallum. Conference on
Knowledge Discovery and Data Mining (KDD), 2007. (Leverage
unlabeled data for text classification by using an objective function
that combines (1) joint probability of labels and words and (2)
conditional probability of labels give words.)
 Expertise Modeling for
Matching Papers with Reviewers. David Mimno and Andrew McCallum.
Conference on Knowledge Discovery and Data Mining (KDD),
2007. (The
AuthorPersonaTopic model is a LDAstyle topic model especially
designed to represent expertise as a mixture of topical intersections.
We show positive results in matching reviewers to conference papers, as
assessed by human judgements.)
 Learning Extractors
from Unlabeled Text using Relevant Databases. Kedar Bellare and
Andrew McCallum. Sixth International Workshop on Information
Integration on the Web (IIWeb), collocated with AAAI,
2007. (Use
conditional random fields to learn information extractors both from DB
fields and from alignments of DB in free text. Uses an Alignment CRF,
similar to our UAI 2005 paper.)
 Efficient Strategies
for Improving PartitioningBased Author Coreference by Incorporating
Web Pages as Graph Nodes. Pallika Kanani and Andrew McCallum. Sixth
International Workshop on Information Integration on the Web (IIWeb),
collocated with AAAI, 2007. (Improve
entity resolution by adding web pages as new "mentions" to the
graphpartitioning problem, and do so efficiently by selecting a subset
of the possible queries and a subset of the returned pages.)
 Probabilistic
Representations for Integrating Unreliable Data Sources. David
Mimno and Andrew McCallum. Sixth International Workshop on Information
Integration on the Web (IIWeb), collocated with AAAI,
2007. (Probabilistic representation of field
values used in merging and augmenting information from DBPL and
research paper PDFs.)
 Author Disambiguation
using ErrorDriven Machine Learning With a Ranking Loss Function.
Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew
McCallum. Sixth International Workshop on Information Integration on
the Web (IIWeb), collocated with AAAI, 2007. (Entity
resolution of people using highorder features, made efficient with
MetropolisHastings and SampleRank, a learning method based ranking.)
 Nonparametric Bayes
Pachinko Allocation. Wei Li, David Blei and Andrew McCallum.
Conference on Uncertainty in Artificial Intelligence (UAI),
2007. (A
version of pachinko allocation that automatically determines the number
of topics (and supertopics), and its sparse connectivity structure by
Dirichlet process priors. Positive results in redisovering known
structure in synthetic data, and in heldout likelihood versus PAM,
hLDA and HDP.)
 Improved Dynamic Schedules
for Belief Propagation. Charles Sutton and Andrew McCallum.
Conference on Uncertainty in Artificial Intelligence (UAI),
2007. (Significantly
faster inference in graphical models by selecting which BP messages to
send based on an approximation to their residual.)
 Simple, Robust, Scalable
Semisupervised Learning via Expectation Regularization. Gideon
Mann and Andrew McCallum. International Conference on Machine Learning (ICML),
2007. (Semisupervised
learning is seldom used in real applications because it is often
complicated to implement, fragile in tuning or inefficient for large
data. We introduce a new highly usable approach to semisupervised
learning, augmenting traditional label loglikelihood with an
additional term that encourages model predictions on unlabeled data to
match certain expectations. Positive results on 5 data sets versus EM,
transductive SVM, entropy regularization and a graphbased method.)
 Piecewise
Pseudolikelihood for Efficient Training of Conditional Random Fields.
Charles Sutton and Andrew McCallum. ICML, 2007. (Train
a large CRF in five times faster by dividing it into separate pieces
and reducing numbers of predicted variable combinations with
pseudolikelihood. Analysis in terms of belief propagation and Bethe
energy.)
 Mixtures of Hierarchical
Topics with Pachinko Allocation. David Mimno, Wei Li and Andrew
McCallum. ICML, 2007. (From
a large document collection automatically discover topic hierarchies,
where documents may be flexibly represented as mixtures across multiple
leaves, not just mixtures up and down a single leafroot path. Thus,
for example, we can represent a document about instructing a robot
in natural language,
where those two topics are leaves. This new model, hPAM, combines the
best of pachinko allocation (PAM) and hierarchical LDA (hLDA). Dramatic
improvements in heldout data likelihood and mutual information between
discovered topics and humanassigned categories.)
 Transfer Learning for
Enhancing Information Flow in Organizations and Social Networks.
Chris Pal, Xuerui Wang and Andrew McCallum. Submitted to Conference on
Email and Spam (CEAS), 2007. Technical Note. (Continuous
hidden varable conditional random field for CC prediction/suggestion in
email.)
 Topic and Role Discovery in
Social Networks with Experiments on Enron and Academic Email.
Andrew McCallum, Xuerui Wang and Andres CorradaEmmanuel. Journal of
Artificial Intelligence Research (JAIR), 2007. (Journal paper version of IJCAI conference paper on
AuthorRecipientTopic (ART) model.)
 Efficient
Computation of Entropy Gradient for SemiSupervised Conditional Random
Fields. Gideon Mann and Andrew McCallum. NAACL/HLT,
(short paper) 2007. (A new, faster dynamic
program for calculating the entropy of a finitestate subsequence and
its gradient.)
 FirstOrder
Probabilistic Models for Coreference Resolution. Aron Culotta,
Michael Wick, Robert Hall and Andrew McCallum. NAACL/HLT,
2007. (Traditional
coreference uses features only over pairs of mentions. Here we present
a conditional random field with firstorder logic for expressing
features, enabling features over sets of mentions. The result
is a new stateoftheart results on ACE 2004 coref, jumping from 69 to
79a 45% reduction in error. The advance depends crucially on a new
method of parameter estimation for such "weighted logic" models based
on learning rankings and errordriven training.)
 Sparse Message Passing
Algorithms for Weighted Maximum Satisfiability. Aron Culotta,
Andrew McCallum, Bart Selman, Ashish Sabharwal. New England Student
Symposium on Artificial Intelligence (NESCAI), 2007. (A
new algorithm for solving weighted maximum satisfiability (WMAXSAT)
problems that divides a large problem into subproblems, and
coordinates the global solution by message passing with sparse
messages. Inspired by the desire to do jointinference in (a) large
weighted logics ala Markov Logic Networks, (b) large NLP pipelines, in
which there are efficient preexisting (dynamic programming) solutions
to subparts of the pipeline. Positive results versus WalkSAT!)
 Cryptogram Decoding for
OCR using Numerzation Strings. Gary Huang, Erik LearnedMiller and
Andrew McCallum. ICDAR, 2007. (Robust
OCR without font appearance models by incorporating language modeling.)
 Penn/UMass/CHOP
BiocreativeII Systems.
Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar
Bellare, Andrew McCallum, Steven Carroll, Yang Jin, and Peter White. BiocreativeII
Evaluation Workshop. 2007. (Description of our
highranking entry in the competition for extraction and linkage from
bioinformatics text.
 Resourcebounded
Information Gathering for Correlation Clustering. Pallika Kanai and
Andrew McCallum. Conference on Computational Learning Theory (COLT)
Open Problems Track, 2007. (We
present a new class of problems in which the goal is to perform
correlational clustering under circumstances in which accuracy can be
improved by augmenting the given graph with additional information.)
 Organizing the OCA:
Learning faceted subjects from a library of digital books. David
Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL),
2007. (Introduces
the DCMLDA topic model, which represents topics by a
Dirichletcompoundmultinomial rather than a multinomial. In addition
to obtaining interesting information about the difference varianes of
the topics, this model lends itself to efficient parallelization with
very coarsegrained synchronization. The result is a topic model that
can run on over 1 billion words in just a few hours.)
 Mining a digital
library for influential authors. David Mimno and Andrew McCallum.
Joint Conference on Digial Libraries (JCDL), 2007. (A
probabilistic model that ranks authors based on their influence on
particular areas of scientific research. Integrates topics with
citation patterns.)
 Improving Author
Coreference by Resourcebounded Information Gathering from the Web.
Pallika Kanani, Andrew McCallum and Chris Pal. International Joint
Conference on Artificial Intelligence (IJCAI), 2007. (Sometimes
there is simply insufficient information to make an accurate entity
resolution decision, and we must gather additional evidence. This paper
describes the use of web queries to improve research paper author
coreference, exploring two methods of augmenting a graph partitioning
problem: using the web to obtain new features on existing edges, and
use the web to obtain new nodes in the graph. We then go on to describe
decisiontheoretic approaches for maximizing accuracy gain with a
limited budget of web queries, and demonstrate our methods on three
large data sets.)
 Dynamic Conditional
Random Fields. Charles Sutton, Andrew McCallum and Khashayar
Rohanimanesh. Journal of Machine Learning Research (JMLR),
Vol. 8(Mar), pages 693723, 2007. (Journal paper
version of ICML paper by the same authors, with new experiments on
marginal likelihood training.)
2006
 On Discriminative
and SemiSupervised Dimensionality Reduction.
Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck and Andrew McCallum.
Advances in Neural Information Processing Systems, Workshop on Novel
Applications of Dimensionality Reduction, (NIPS Workshop),
2006. (Using
MultiConditional Learning, learn to distribute mixture components just
were needed to address some discriminative task. See compelling figure
on synthetic overlapping spiral data.)
 Learning Field
Compatibilities to Extract Database Records from Unstructured Text.
Michael Wick, Aron Culotta and Andrew McCallum. Empirical Methods in
Natural Language Processing (EMNLP), 2006. (Record extraction, jointly accounting for multifield
compatibility by content and layout features.)
 Tractable Learning
and Inference with HigherOrder Representations. Aron Culotta and
Andrew McCallum. ICML Workshop on Open Problems in
Statistical Relational Learning, 2006. (When
working with CRFs having features based on firstorder logic, the
"unrolled" graphical model would be far to large to fully instantiate.
This paper describes a method leveraging MCMC to perform inference and
learning while only partially instantiating the model. Positive results
on entity resolution (of research papr authors) are described.)
 Corrective Feedback
and Persistent Learning for Information Extraction. Aron Culota,
Trausti Kristjansson, Andrew McCallum, Paul Viola. Artificial
Intelligence Journal (AIJ), volume 170, pages
11011122, 2006. (Help
a user interactively correct the results of extraction by providing
uncertainty cues in the UI, and by using constrained Viterbi to
automatically make additional corrections after the first human
correction. Journal paper version of AAAI paper by the same authors
below. Adds experiments with active learning.)
 CC Prediction with
Graphical Models. Chris Pal and Andrew McCallum. Conference on
Email and AntiSpam (CEAS), 2006. (Help
keep an organization coordinated by suggesting who to carboncopy on
your outgoing email message.)
 Practical Markov
Logic Containing Firstorder Quantifiers with Application to Identity
Uncertainty. Aron Culotta, Andrew McCallum. HLT Workshop
on Computationally Hard Problems and Joint Inference in Speech and
Language Processing, 2006. (Markov
Logic Networks are Conditional Random Fields that use firstorder logic
to define features and parameter tying patterns. Making such models
scale to nontrivial data set sizes is a challenge because the size of
the full instantiation of the model is exponential in the arity of the
formulae. Here we describe a method of partial instantiation that
allows such models to scale to entity resolution problems millions of
entity mentions. On both citation and author entity resolution problems
we show that inclusing such firstorder features provides increases in
accuracy.)
 A ContinuousTime
Model of Topic Cooccurrence Trends. Xuerui Wang, Wei Li, and
Andrew McCallum. AAAI Workshop on Event Detection,
2006. (Capture
the time distributions not only of a topics, but also of their
cooccurrences. For example, notice that while NLP and ML have both
been around for a long time, but their cooccurrence has been rising
recently. The model is effectively a combination of the Pachinko
Allocation Model (PAM) and TopicsOverTime (TOT).)
 Combining Generative
and Discriminative Methods for Pixel Classification with
MultiConditional Learning. Michael Kelm, Chris Pal, and Andrew
McCallum. Draft accepted to the International Conference on Pattern
Recognition (ICPR), 2006. (Multiconditional
learning explored in the context of computer vision.)
 MultiConditional Learning:
Generative/Discriminative Training for Clustering and Classification.
Andrew McCallum, Chris Pal, Greg Druck, Xuerui Wang. AAAI,
2006. (Estimate
parameters of an undirected graphical model not by joint likelihood, or
conditional likelihood, but by a product of multiple conditional
likelihoods. Can act as an improved regularizer. With latent variables,
can cluster structured, relational data, like Latent Dirichlet
Allocation and its successors, but with undirected graphical models and
(crosscutting) conditionaltraining. Improved results on document
classification, Jebarainspired synthetic data, and over the Harmonium
as tested on an information retreival task.)
 Pachinko Allocation:
DAGstructured Mixture Models of Topic Correlations. Wei Li, and
Andrew McCallum. ICML, 2006. (An
LDAstyle topic model that captures correlations between topics,
enabling discovery of finergrained topics. Similar motivations to Blei
and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture
arbitrary, nested and possibly sparse correlations among topics.
Interior nodes of the DAG have a Dirichlet distribution over their
children; words are in the leaves. Provides improved interpretability
and heldout data likelihood.)
 Topics over Time: A
NonMarkov ContinuousTime Model of Topical Trends. Xuerui Wang and
Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD)
2006. (A
new LDAstyle topic model that models trends over time. The meaning of
a topic remains fixed and reliable, but its prevalence over time is
captured, and topics may thus focus in on cooccurrence patterns that
are timesensitive. Unlike other work that relies on Markov assumptions
or discretization of time, here each topic is associated with a
continuous distribution over timestamps. Improvements in topic saliency
and the ability to predict time given words.)
 Exploring the
Use of Conditional Random Field Models and HMMs for Historical
Handwritten Document Recognition. Shaolei L. Feng, R. Manmatha and
Andrew McCallum. IEEE International Conference on Document Image
Analysis for Libraries (DIAL 06), pp. 3037. 2006. (Mixed results on CRFs applied to handwritten word
recognition.)
 Reducing Weight
Undertraining in Structured Discriminative Learning. Charles
Sutton, Michael Sindelar, and Andrew McCallum. HLTNAACL,
2006. (Train
separately CRFs with different subsets of the features, then integrate
them at test timefour different variations on the method. Especially
make more reliable use of lexicon features and other highlypredictable
but brittle features.)
 Integrating
Probabilistic Extraction Models and Relational Data Mining to Discover
Relations and Patterns in Text. Aron Culotta, Andrew McCallum and
Jonathan Betz. HLTNAACL, 2006. (Extract
relations from Wikipedia articles. Run data mining on the relational
graph to obtain patterns that are predictive of relationssuch as
"opponent of my opponent is my ally" and "a person is likely to have
the same religion as their parents." Then use feaures derived from
these patterns in a second run of extraction that improves accuracy.)
 Bibliometric Impact
Measures Leveraging Topic Analysis. Gideon Mann, David Mimno and
Andrew McCallum. Joint Conference on Digital Libraries (JCDL)
2006. (Use
a new topic model that leverages ngrams to discover interpretable,
finegrained topics in over a million research papers. Use these topic
divisions as well as automated citation analysis to extend three
existing bibliometric impact measures, and create three new ones:
Topical Diversity, Topical Transfer, Topical Precedence.)
 An Introduction to
Conditional Random Fields for Relational Learning. Charles Sutton
and Andrew McCallum. Book chapter in Introduction
to Statistical Relational Learning. Edited by Lise Getoor and Ben
Taskar. MIT Press. 2006. (An
overview and introduction to conditional random fields for beginners
and experts alikemotivation, background, mathematical foundations,
linearchain form, generalstructure form, inference, parameter
estimation, tips and tricks, an example application to information
extraction with a skipchain structure.)
 Sparse ForwardBackward
using Minimum Divergence Beams for Fast Training of Conditional Random
Fields. Chris Pal, Charles Sutton, and Andrew McCallum. In
International Conference on Acoustics, Speech, and Signal Processing (ICASSP),
2006.
(An alternative method for beamsearch based on variational principles.
Enables not only faster testtime performance of largestatespace
CRFs, but this method makes beam search robust enough to be used at
training time, enabling dramatically faster learning of discriminative
finitestate methods for speech, IE and other applications.)
 Table extraction
for answer retrieval. Xing Wei, Bruce Croft and Andrew McCallum.
Information Retrieval Journal (IRJ), volume 9, issue
5, pages 589611, November 2006. (Information
extraction from tables, using conditional random fields with language
and layout features, with application to question answering. Journal
paper version of our SIGIR 2003 paper.)
 Semisupervised Text
Classification Using EM. Kamal Nigam, Andrew McCallum and Tom
Mitchell. Book chapter in Chapelle, O., Zien, A., and
Scholkopf, B. (Eds.) SemiSupervised Learning. MIT Press:
Boston. 2006.
(Overview, description, experiments on using expectation maximization
with naive Bayes text classifiers for learning from labeled and
unlabeled data. A chapter in a book about various methods of
semisupervised learning.)
 Group and Topic Discovery
from Relations and Their Attributes. Xuerui Wang, Natasha Mohanty
and Andrew McCallum. Neural Informaion Processing Systems (NIPS),
2006.
(Social network analysis that simultaneously discovers groups of
entities and also clusters attributes of their relations, such that
clustering in each dimension informs the other. Applied to the voting
records and corresponding text of resolutions from the U.S. Senate and
the U.N., showing that incorporating the votes results in more salient
topic clusters, and that different groupings of legislators emerge from
different topics.)
2005
 A Note on Topical Ngrams.
Xuerui Wang and Andrew McCallum. University of Massachusetts Technical
Report UMCS2005071, 2005. (Discover topics
like Latent Dirichlet Allocation, but model phrases
in addition to single words on a pertopic basis. For example, in the
Politics topic, "white house" has special meaning as a colocation,
while in the RealEstate topic, modeling the individual words is
sufficient. Our TNG model produces much cleaner, more interpretable
topics.)
 Pachinko allocation: A Directed Acyclic Graph for Topic
Correlations. Wei Li and Andrew McCallum. NIPS Workshop on
Nonparametric Bayesian Methods, 2005. (Similar
motivations to Blei and Lafferty's Correlated Topic Model (CTM), but
uses a DAG to capture arbitrary and possibly sparse correlations among
topics. Interior nodes of the DAG have a Dirichlet distribution over
their children; words are in the leaves. Provides improved
interpretability and classification, as well as improved heldout
likelihood over CTM. See ICML 2006 paper above.)
 Direct Maximization
of RankBased Metrics for Information Retrieval. Don Metzler, W.
Bruce Croft and Andrew McCallum. CIIR Technical Report IR429, 2005.
 Information Extraction:
Distilling Structured Data from Unstructured Text . Andrew
McCallum. ACM Queue, volume 3, Number 9, November 2005. (An
overview of information extraction by machine learning methods, written
for people not familiar with machine learning, especially CTOs and
other people in business.)
 Learning Clusterwise
Similarity with Firstorder Features. Aron Culotta and Andrew
McCallum. NIPS Workshop on the Theoretical Foundations of Clustering.
2005. (Discriminativelytrained
graphpartitioning methods for clustering, with features over entire
clusters, including existential and universal quanifiers. Efficiently
instantiate these features only on demand.)
 Composition of
Conditional Random Fields for Transfer Learning.
Charles Sutton and Andrew McCallum. Proceedings of Human Language
Technologies / Emprical Methods in Natural Language Processing
(HLT/EMNLP) 2005. (Improve information extraction
from email data by using the output of another extractor that was
trained on large quantities of newswire. Improve accuracy further by
using joint inference between the two tasksso that the final target
task can actually affect the output of the intermediate task.)
 Feature Bagging: Preventing
Weight Undertraining in Structured Discriminative Learning.
Charles Sutton, Michael Sindelar, and Andrew McCallum. Center for
Intelligent Information Retrieval, University of Massachusetts
Technical Report IR402. 2005. (Avoid a common
underappreciated problem: overly heavy reliance on a few
discriminative features which may not be as reliably present in the
testing data. Discusses four methods of separate training and
combination, and presents statisticallysignificant
improvementsincluding new best results on CoNLL2000 NP Chunking.)
 Fast, Piecewise Training
for Discriminative Finitestate and Parsing Models. Charles Sutton
and Andrew McCallum. Center for Intelligent Information Retrieval
Technical Report IR403. 2005. (Further results
with "piecewise training", a method also described in a UAI'05 paper
below.)
 Practical Markov
Logic Containing Firstorder Quantifiers with Application to Identity
Uncertainty. Aron Culotta and Andrew McCallum. Technical Report
IR430, University of Massachusetts, September 2005. (Use
existental and universal quantifiers in Markov Logic, doing so
practially and efficiently by incrementally instantiating these terms
as needed. Applied to object correspondence, this model combines the
expressivity of BLOG with the predictive accuracy advantages of
conditional probability training. Experiments on citation matching and
author disambiguation.)
 Joint Deduplication of
Multiple Record Types in Relational Data. Aron Culotta and Andrew
McCallum. Fourteenth Conference on Information and Knowledge Management
(CIKM), 2005.
(Longer Tech Report version: A Conditional Model of
Deduplication for Multitype Relational Data. Technical Report
IR443, University of Massachusetts, September 2005. (Leverage
relations among multiple entity types to perform coreference
collectively among all types. Uses CRFstyle graph partitioning with a
learned distance metric. Experimental results on joint coreference of
both citations and their venues showing that accuracy on both improves.)
 Collective
MultiLabel Classification. Nadia Ghamrawi and Andrew McCallum.
Fourteenth Conference on Information and Knowledge Management (CIKM),
2005. (Multilabel
document classification with a conditional maximum entropy model that
captures not only the traditional dependences between words and the
class labels, but also the coocurrence dependencies between
the class labels. Performs joint inference among all class labels.)
 Predictive
Random Fields: Latent Variable Models Fit by Multiway Conditional
Probability with Applications to Document Analysis. Andrew
McCallum, Xuerui Wang and Chris Pal. UMass Technical Report
UMCS2005053, version 2.1. 2005. (Cluster
structured, relational data, like Latent Dirichlet Allocation and its
successors, but with undirected graphical models that are
conditionallytrained. Improved results over Jebarainspired synthetic
data, and over the Harmonium as tested on an information retreival
task. This is an evolving Tech Report, which needs to be updatedin
particular we are now referring to this method as "MultiConditional
Learning" or "MultiConditional Mixtures".)
 Group and Topic
Discovery from Relations and Text.
Xuerui Wang, Natasha Mohanty and Andrew McCallum. KDD Workshop on Link
Discovery: Issues, Approaches and Applications (LinkKDD) 2005. (Social
network analysis that simultaneously discovers groups of entities and
also clusters attributes of their relations, such that clustering in
each dimension informs the other. Applied to the voting records and
corresponding text of resolutions from the U.S. Senate and the U.N.,
showing that incorporating the votes results in more salient topic
clusters, and that different groupings of legislators emerge from
different topics.)
 Detecting Anomalies in
Network Traffic Using Maximum Entropy Estimation. Yu Gu, Andrew
McCallum and Don Towsley. Internet Measurement Conference, 2005. (Build
a density model of normal Internet traffic with Maximum Entropy and
feature induction. Detect network attacks by density threshold.)
 A Conditional Random
Field for Discriminativelytrained Finitestate String Edit Distance.
Andrew McCallum, Kedar Bellare and Fernando Pereira. Conference on
Uncertainty in AI (UAI), 2005. (Train
a string edit distance function from both positive and negative
examples of string pairs (matching and mismatching). Significantly, the
model designer is free to use arbitrary, fancy features of both
strings, and also very flexible edit operations. This model is an
example of an increasingly popular interesting
classconditionallytrained models with latent variables. Positive
results on citations, addresses and names.)
 Joint Parsing and
Semantic Role Labeling. Charles Sutton and Andrew McCallum. CoNLL
(Shared Task), 2005. (Attempt
to improve accuracy by performing joint inference over parsing and
semantic role labelingpreserving uncertainty and multiple hypotheses
in Dan Bikel's parser. Unfortunately the effort yielded negative
results, most likely because the components needed to produce better
calibrated probabilities.)
 Gene Prediction with
Conditional Random Fields.
Aron Culotta, David Kulp, and Andrew McCallum. Technical Report
UMCS2005028, University of Massachusetts, Amherst, April 2005. (Use
finitestate CRFs to locate introns and exons in DNA sequences. Shows
the advantages of CRFs' ability to straightforwardly incorporate
homology evidence from protein databases.)
 SemiSupervised
Sequence Modeling with Syntactic Topic Models. Wei Li and Andrew
McCallum. AAAI, 2005. (Learn
a lowdimensional manifold from large quantities of unlabled text data,
then use components of the manifold as additional features when
training a linearchain CRF with limited labeled data. The manifold is
learned using HMMLDA [Griffiths, Steyvers, Blei, Tenenbaum 2004], an
unsupervised model with special structure suitable for sequences and
topics. Experimens with English partofspeech tagging and Chinese word
segmentation.)
 Reducing Labeling
Effort for Structured Prediction Tasks. Aron Culotta and Andrew
McCallum. AAAI, 2005. (A
step toward bringing trainable information extraction to the masses!
Make it easier for endusers to train IE by providing multiplechoice
labeling options, and propagating any constraints their labels provide
on portions of the recordlabeling task.)
 Topic and Role Discovery
in Social Networks. Andrew McCallum, Andres CorradaEmmanuel and
Xuerui Wang. IJCAI, 2005. (Conference
paper version of tech report by same authors in 2004 below. Also
includes new results with RoleAuthorRecipientTopic model. Discover
roles by social network analysis with a Bayesian network that models
both links and text messages exchanged on those links. Experiments with
Enron email and academic email.)
 Piecewise Training for
Undirected Models. Charles Sutton and Andrew McCallum. UAI, 2005. (Efficiently
train a large graphical model in separately normalized pieces, and
amazingly often obtain higher accuracy than without this approximation.
This paper also shows that this piecewise objective is a lower bound on
the exact likelihood, and gives results with three different graphical
model structures.)
 Constrained Kronecker
Deltas for Fast Approximate Inference and Estimation. Chris Pal,
Charles Sutton, Andrew McCallum. Submitted to UAI, 2005. (Sometimes
the graph of the graphical model is not large and complex, but the
cardinality of the variables is large. This paper describes a new and
generalized method for beam search on graphical models, showing
positive experimental results for both inference and training.
Experiments on NetTalk.)
 MultiWay Distributional
Clustering via Pairwise Interactions. Ron Bekkerman, Ran ElYaniv
and Andrew McCallum. ICML 2005. (Distributional
clustering in multiple feature dimensions or modalities at oncemade
efficient by a factored representation as used in graphical models, and
by a combination of topdown and bottomup clustering. Results on email
clustering, and new best results on 20 Newsgroups.)
 Disambiguating Web Appearances
of People in a Social Network. Ron Bekkerman and Andrew McCallum.
WWW Conference, 2005. (Find
homepages and other Web pages mentioning particular people. Do a better
job by leveraging a collection of related people.)
2004
 Piecewise Training with
Parameter Independence Diagrams: Comparing Globally and
Locallytrained Linearchain CRFs.
Andrew McCallum and Charles Sutton. Center for Intelligent Information
Retrieval, University of Massachusetts Technical Report IR383.
2004.
(Also presented at NIPS 2004 Workshop on Learning with Structured
Outputs.) (Large
undirected graphical models are expensive to train because they require
global inference to calculate the gradient of the parameters. We
describe a new method for fast training in locallynormalized pieces.
Amazingly the resulting models also give higher accuracy than their
globallytrained counterparts.)
 Automatic
Categorization of Email into Folders: Benchmark Experiments on Enron
and SRI Corpora. Ron Bekkerman, Andrew McCallum and Gary Huang.
UMass CIIR Technical Report IR418, 2004. (Extensive
experiments on realworld email foldering.)
 The AuthorRecipientTopic
Model for Topic and Role Discovery in Social Networks: Experiments with
Enron and Academic Email.
Andrew McCallum, Andres CorradaEmmanuel, Xuerui Wang. Technical Report
UMCS2004096, 2004. (Also presented the NIPS'04 Workshop on "
Structured Data and Representations in Probabilistic Models for
Categorization") (Social network analysis that
not only models links between people, but the word content of the
messages exchanged between them. Discovers salient topics guided by the
senderrecipient structure in data, and provides improved ability to
measure rolesimilarity between people. A generative model in the style
of Latent Dirichlet Allocation.)
 Conditional Models of
Identity Uncertainty with Application to Noun Coreference. Andrew
McCallum and Ben Wellner. Neural Information Processing Systems (NIPS),
2004. (A
model of object consolidation, based on graph partitioning with learned
edge weights. Conference paper version of 2003 work in KDD Workshop on
Data Cleaning.)
 An Integrated,
Conditional Model of Information Extraction and Coreference with
Application to Citation Matching. Ben Wellner, Andrew McCallum,
Fuchun Peng, Michael Hay. Conference on Uncertainty in Artificial
Intelligence (UAI), 2004. (A
conditionallytrained graphical model for identity uncertainty in
relational domains, representing mentions, entities and their
attributes. Also a first example of joint inference for extraction and
identity uncertaintycoreference decisions actually integrate out
uncertainty about information extraction.)
 Collective
Segmentation and Labeling of Distant Entities in Information Extraction.
Charles Sutton and Andrew McCallum. ICML workshop on Statistical
Relational Learning, 2004. (Makes
the boundaries and types of distant segments interdependent by
augmenting a linearchain CRF with additional long, arching edges.
Approximate inference by TreeReparameterization.)
 An Exploration of Entity
Models, Collective Classification and Relation Description. Hema
Raghavan, James Allan and Andrew McCallum. KDD Workshop on Link
Analysis and Group Detection, August 2004. (Part
of a student synthesis project: includes an application of RMNs to
classifying people in newswire.)
 Sign Detection in
Natural Images with Conditional Random Fields. Jerod Weinman, Al
Hansen and Andrew McCallum. IEEE International Workshop on Machine
Learning for Signal Processing, 2004. (Part of a
student synthesis project: a gridshaped CRF with inference by
beliefpropagation with TreeReparameterization.)
 Extracting Social Networks and
Contact Information from Email and the Web. Aron Culotta, Ron
Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS)
2004. (Describes
an early version of an endtoend system that automatically populates
your email address book with a large social network, including
"friendsoffriends," and information about people's expertise.)
 Dynamic Conditional Random
Fields: Factorized Probabilistic Models for Labeling and Segmenting
Sequence Data. Charles Sutton, Khashayar Rohanimanesh and Andrew
McCallum. ICML 2004. (Joint
inference over two traditionallyseparate layers of NLP processing:
POStagging and NPchunking. Introduces the CRF analogue of Factorial
HMMs. Compares several approximate inference procedures.)
 Interactive Information
Extraction with Constrained Conditional Random Fields.
Trausti Kristjannson, Aron Culotta, Paul Viola and Andrew McCallum.
Nineteenth National Conference on Artificial Intelligence (AAAI 2004).
San Jose, CA. (Winner of Honorable Mention Award.) (Help
a user interactively correct the results of extraction by providing
uncertainty cues in the UI, and by using constrained Viterbi to
automatically make additional corrections after the first human
correction.)
 Accurate Information
Extraction from Research Papers using Conditional Random Fields.
Fuchun Peng and Andrew McCallum. Proceedings of Human Language
Technology Conference and North American Chapter of the Association for
Computational Linguistics (HLTNAACL), 2004. (Applies
CRFs to extraction from research paper headers and reference sections,
to obtain current bestintheworld accuracy. Also compares some simple
regularization methods.)
 Chinese Segmentation and
New Word Detection using Conditional Random Fields.
Fuchun Peng, Fangfang Feng, and Andrew McCallum. Proceedings of The
20th International Conference on Computational Linguistics (COLING
2004) , August 2327, 2004, Geneva, Switzerland. (Stateofthe
art Chinese word segmentation with CRFs, with rich features and many
lexicons; also using confidence estimation to add new words to the
lexicon.)
 Confidence Estimation
for Information Extraction.
Aron Culotta and Andrew McCallum. Proceedings of Human Language
Technology Conference and North American Chapter of the Association for
Computational Linguistics (HLTNAACL), 2004, short paper. (How to provide not only an answer, but a
formallyjustified confidence in that answerusing contrained
forwardbackward..)
 A Note on Semisupervised
Learning using Markov Random Fields. Wei Li and Andrew McCallum.
Technical Note, February 3, 2004. (A
general framework for semisupervised learning in Conditional Random
Fields, with a focus on learning the distance metric between instances.
Experimental results with collective classification of documents.)
2003
 Dynamic Conditional
Random Fields for Jointly Labeling Multiple Sequences. Andrew
McCallum, Khashayar Rohanimanesh and Charles Sutton. NIPS*2003 Workshop
on Syntax, Semantics, Statistics, 2003. (Workshop
version of ICML 2004 paper.)
 Classification with
Hybrid Generative/Conditional Models. Rajat Raina, Yirong Shen,
Andrew Y. Ng, Andrew McCallum. Proceedings of Neural Information
Processing Systems (NIPS), 2003. (Train some
parameters generatively, some parameters conditionally.)
 Rapid Development of
Hindi Named Entity Recognition Using Conditional Random Fields and
Feature Induction. Wei Li and Andrew McCallum. ACM Transactions on
Asian Language Information Processing, 2003. (How
we developed a named entity recognition system for Hindi in just a few
weeks.)
 A Note on the
Unification of Information Extraction and Data Mining using
ConditionalProbability, Relational Models. Andrew McCallum and
David Jensen. IJCAI'03 Workshop on Learning Statistical Models from
Relational Data, 2003. (Describes
bigpicture motivation and approach for research that performs
information extraction and data mining in an integrated fashion, rather
than in two separate serial steps. Lays out a major thrust of my
current research over a multiyear span.)
 Efficiently Inducing
Features of Conditional Random Fields. Andrew McCallum. Conference
on Uncertainty in Artificial Intelligence (UAI), 2003. (CRFs
give you the great power to include the kitchen sink worth of features.
How do you decide which ones to include to avoid overfitting and
running out of memory? A formal, informationtheoretic approach, with
carefullychosen approximations to make it efficient with millions of
candidate features. This technique key to success in Hindi above, as
well as work by Pereira's group at UPenn)
 Early Results for
Named Entity Recognition with Conditional Random Fields, Feature
Induction and WebEnhanced Lexicons. Andrew McCallum and Wei Li.
Seventh Conference on Natural Language Learning (CoNLL), 2003. (This is the first publication about named entity
extraction with CRFs.)
 Table Extraction
Using Conditional Random Fields. David Pinto, Andrew McCallum, Xing
Wei and W. Bruce Croft. Proceedings of the ACM SIGIR, 2003. (Application of CRFs to finding tables in government
reports. Uses both language and layout features.)
 Object Consolidation
by Graph Partitioning with a Conditionallytrained Distance Metric.
Andrew McCallum and Ben Wellner. KDD Workshop on Data Cleaning, Record
Linkage and Object Consolidation, 2003. (Later,
improved version of workshop paper immediately below.)
 Toward Conditional
Models of Identity Uncertainty with Application to Proper Noun
Coreference. Andrew McCallum and Ben Wellner. IJCAI Workshop on
Information Integration on the Web, 2003. (A
conditionallytrained model of object consolidation, based on graph
partitioning with learned edge weights.)
 Challenges
in information retrieval and language modeling:
report of a workshop held at the Center for Intelligent Information
Retrieval, University of Massachusetts Amherst. James Allan et al. ACM
SIGIR Forum, Volume 37 Issue 1, April 2003. (A
report about fruitful areas for future work in IR over a fiveyear time
scale.)
2002
2001
2000
 Learning
to Understand the Web. William Cohen, Andrew McCallum, Dallan
Quass. IEEE
Data Engineering Bulletin. September 2000, Vol. 23, No. 3. Pages
1724.
 Automating the
Contruction of Internet Portals with Machine Learning.
Andrew McCallum, Kamal Nigam, Jason Rennie, Kristie Seymore.
Information Retrieval Journal, volume 3, pages 127163. Kluwer. 2000.
 Maximum Entropy Markov
Models for Information Extraction and Segmentation. Andrew
McCallum, Dayne Freitag and Fernando Pereira. ICML2000.
 Efficient Clustering
of HighDimensional Data Sets with Application to Reference Matching.
Andrew McCallum, Kamal Nigam and Lyle Ungar. KDD2000.
 Information
Extraction with HMM Structures Learned by Stochastic Optimization.
Dayne Freitag and Andrew McCallum AAAI2000.
 Creating Customized
Authority Lists. Huan Chang, David Cohn and Andrew McCallum.
ICML2000.
 Semisupervised
Clustering with User Feedback. David Cohn, Rich Caruana and Andrew
McCallum. Unpublished manuscript. (Submitted to AAAI 2000)
1999
 MultiLabel Text
Classification with a Mixture Model Trained by EM. Andrew McCallum.
Revised version of paper appearing in AAAI'99 Workshop on Text
Learning.
 A Hierarchical
Probabilistic Model for Novelty Detection in Text. Doug Baker,
Thomas Hofmann, Andrew McCallum and Yiming Yang. Unpublished
manuscript. (Submitted to NIPS'99.)
 Using Maximum
Entropy for Text Classification. Kamal Nigam, John Lafferty, Andrew
McCallum. IJCAI'99 Workshop on Information Filtering.
 Information
Extraction with HMMs and Shrinkage Dayne Frietag and Andrew
McCallum. AAAI'99 Workshop on Machine Learning for Information
Extraction.
 Learning Hidden
Markov Model Structure for Information Extraction Kristie Seymore,
Andrew McCallum, Roni Rosenfeld. AAAI'99 Workshop on Machine Learning
for Information Extraction.
 Building
DomainSpecific Search Engines with Machine Learning Techniques.
Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. AAAI99
Spring Symposium. A related paper
was also accepted to IJCAI'99.
 Using Reinforcement
Learning to Spider the Web Efficiently. Jason Rennie and Andrew
McCallum. ICML'99.
 Bootstrapping for
Text Learning Tasks.
Rosie Jones, Andrew McCallum, Kamal Nigam and Ellen Riloff. IJCAI99
Workshop on Text Mining: Foundations, Techniques and Applications.
1998
 A Comparison of
Event Models for Naive Bayes Text Classification. Andrew McCallum
and Kamal Nigam. AAAI98 Workshop on "Learning for Text
Categorization".
 Improving Text
Classification by Shrinkage in a Hierarchy of Classes. Andrew
McCallum, Ronald Rosenfeld, Tom Mitchell and Andrew Ng. ICML98.
 Employing EM in
PoolBased Active Learning for Text Classification. Andrew McCallum
and Kamal Nigam. ICML98.
 Distributional
Clustering of Words for Text Classification. Doug Baker, Andrew
McCallum. SIGIR98.
 Text Classification
from Labeled and Unlabeled Documents using EM. Kamal Nigam, Andrew
McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning, 39(2/3).
pp. 103134. 2000.
 Learning to Classify
Text from Labeled and Unlabeled Documents. Kamal Nigam, Andrew
McCallum, Sebastian Thrun and Tom Mitchell. AAAI98.
 Learning
to Extract Knowledge from the World Wide Web. Mark Craven, Dan
DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam,
Sean Slattery. AAAI98.
1997
1996
 McCallum, R. Andrew,
Hidden State and Reinforcement Learning with InstanceBased State
Identification, IEEE Transations on Systems, Man and Cybernetics
(Special issue on Robot Learning), 26(3):464473, 1996.
 McCallum, R. Andrew, Learning
to Use Selective Attention and ShortTerm Memory in Sequential Tasks,
in From Animals to Animats, Fourth International Conference on
Simulation of Adaptive Behavior, (SAB'96). Cape Cod, Massachusetts.
September, 1996.
1995
 McCallum, Andrew K.,
Reinforcement Learning with Selective Perception and Hidden State,
PhD. thesis. December, 1995.
 McCallum, R. Andrew,
InstanceBased Utile Distinctions for Reinforcement Learning, The
Proceedings of the Twelfth International Machine Learning Conference
(ML'95), Lake Tahoe, CA, 1995.
 McCallum, R. Andrew,
InstanceBased State Identification for Reinforcement Learning,
Advances in Neural Information Processing Systems (NIPS 7), 1995.
1994
 McCallum, R. Andrew,
First Results with InstanceBased State Identification for
Reinforcement Learning, URCS Tech Report 502, 1994.
 McCallum, R. Andrew,
Reduced Training Time for Reinforcement Learning with Hidden State,
The Proceedings of the Eleventh International Machine Learning Workshop
(Robot Learning), New Brunswick, NJ, 1994.
 McCallum, R. Andrew,
ShortTerm Memory in Visual Routines for `OffRoad Car Chasing',
Working Notes of AAAI Spring Symposium Series, "Toward Physical
Interaction and Manipulation", Stanford University, March 2123, 1994.
1993 and earlier
 McCallum, R. Andrew,
Overcoming Incomplete Perception with Utile Distinction Memory, The
Proceedings of the Tenth International Machine Learning Conference
(ML'93), Amherst, MA, 1993.
 McCallum, R. Andrew,
Learning with Incomplete Selective Perception, Thesis Proposal,
URCS Tech Report 453, 1993.
 Garrett, Scott, Bianchini, Kontothanassis, McCallum,
Thomas, Wisniewski and Luk,
Linking Shared Segments, Winter USENIX, San Diego, CA, 1993.
 McCallum, R. Andrew,
First Results with Utile Distinction Memory for Reinforcement Learning,
URCS Tech Report 446, 1992.
 McCallum, R. Andrew,
Using Transitional Proximity for Faster Reinforcement Learning, The
Proceedings of the Ninth International Machine Learning Conference
(ML'92), Aberdeen, Scotland, 1992.
 Garrett, Bianchini, Kontothanassis, McCallum, Thomas,
Wisniewski and Scott,
Dynamic Sharing and Backward Compatibility on 64Bit Machines, URCS
Tech Report 418, 1992.
 McCallum, R. Andrew, and Spackman, Kent A.,
Using Genetic Algorithms to Learn Disjunctive Rules from Examples,
The Proceedings of the Seventh International Machine Learning
Conference (ML'90), Austin, Texas, 1990.
