Andrew McCallum Publications

Publications

2020

Simultaneously Linking Entities and Extracting Relations from Biomedical Text Without Mention-level Supervision. Trapit Bansal, Pat Verga, Neha Choudhary, Andrew McCallum, Conference of Association for the Advancement of Artificial Intelligence (AAAI) 2020.

2019

Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks. Pedram Rooshenas, Dongxu Zhang, Gopal Sharma, Andrew McCallum. Proceedings of Neural Information Processing Systems (NeurIPS) 2019.
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering. Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, Andrew McCallum. EMNLP-IJCNLP Workshop on Machine Reading for Question Answering (MRQA, Best Paper Award), 2019.
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering. Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, Andrew McCallum. ArXiv preprint arXiv:1909.07598, 2019.
Scalable Hierarchical Clustering with Tree Grafting. Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael R Glass, Andrew McCallum. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019.
Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019.
Paper Matching with Local Fairness Constraints. Ari Kobren, Barna Saha, Andrew McCallum. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019.
Smoothing the Geometry of Probabilistic Box Embeddings.. Xiang Li*, Luke Vilnis*, Dongxu Zhang, Michael Boratko and Andrew McCallum. International Conference on Learning Representations (ICLR) Oral presentation, 2019.
Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering. Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum, International Conference on Learning Representations (ICLR) 2019.
Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension. Rajarshi Das, Tsendsuren Munkhdalai, Eric Xingdi Yuan, Adam Trischler, Andrew McCallum. International Conference on Learning Representations (ICLR) 2019.
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders . Andrew Drozdov*, Patrick Verga* , Mohit Yadav*, Mohit Iyyer, and Andrew McCallum. Association of Computational Linguistics (ACL), 2019.
Optimal Transport-based Alignment of Learned Character Representations for String Similarity. Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum. Association of Computational Linguistics (ACL), 2019.
A2N: Attending to Neighbors for Knowledge Graph Inference. Trapit Bansal, Da-Cheng Juan, Sujith Ravi, Andrew McCallum. Association of Computational Linguistics (ACL), 2019.
Energy and Policy Considerations for Deep Learning in NLP. Emma Strubell, Ananya Ganesh and Andrew McCallum. Association of Computational Linguistics (ACL), 2019.
Supervised Hierarchical Clustering with Exponential Linkage. Nishant Yadav, Ari Kobren, Nichonas Monath, Andrew McCallum. International Conference on Machine Learning (ICML), 2019.
Integrating User Feedback under Identity Uncertainty in Knowledge Base Construction. Ari Kobren, Nicholas Monath, Andrew McCallum. Automated Knowledge Base Construction (AKBC), 2019.
The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures. Sheshera Mysore, Zach Jensen, Edward Kim, Kevin Huang, Haw-Shiuan Chang, Emma Strubell, Jeffrey Flanigan, Andrew McCallum, Elsa Olivetti. LAW XIII 2019: The 13th Linguistic Annotation Workshop (ACL WS), 2019.
Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks. Edward Kim, Zach Jensen, Alexander van Grootel, Kevin Huang, Matthew Staib, Sheshera Mysore, Haw-Shiuan Chang, Emma Strubell, Andrew McCallum, Stefanie Jegelka, and Elsa Olivetti. arXiv pre-print 1901.00032, in submission, 2019.

2018

Compact Representation of Uncertainty in Clustering. Craig Greenberg, Nicholas Monath, Ari Kobren, Patrick Flaherty, Andrew McGregor, Andrew McCallum. Neural Information Processing Systems (NIPS), 2018.
Embedded-State Latent Conditional Random Fields for Sequence Labeling. Dung Thai, Sree Harsha Ramesh, Shikhar Murty, Luke Vilnis and Andrew McCallum. Conference on Computational Natural Language Learning (CoNLL), 2018.
Linguistically-Informed Self-Attention for Semantic Role Labeling. Emma Strubell, Patrick Verga, Daniel Andor, David Weiss and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP, Best long paper award). Brussels, Belgium. October 2018.
Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets. Nathan Greenberg, Trapit Banasl, Patrick Verga , and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP short). Brussels, Belgium. October 2018.
Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings. Haw-Shiuan Chang, Amol Agrawal, AAnanya Ganesh, AAnirudha Desai, Vinayak Mathur, Alfred Hough, and Andrew McCallum. TextGraphs-12: the Workshop on Graph-based Methods for Natural Language Processing, (NAACL HLT WS), 2018.
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset. Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael Witbrock. Association for Computational Linguistics Workshop on Machine Reading for Question Answering (ACL WS, Best paper award) 2018.
Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL? Emma Strubell and Andrew McCallum. Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP (ACL WS). Melbourne, Australia. July 2018.
Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures. Luke Vilnis*, Xiang Lorraine Li*, Shikhar Murty, Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL) 2018.
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking. Shikhar Murty*, Patrick Verga*, Luke Vilnis, Irena Radovanovic and Andrew McCallum. The 56th Annual Meeting of the Association for Computational Linguistics oral presentation (ACL) 2018.
Learning Conditionally Calibrated Equations of State for Direct Fired sCO2 Cycles with Deep Neural Networks. Luke Vilnis, David Freed, Navid Rafati, Joe Camilo, Andrew McCallum. The 6th International Supercritical CO2 Power Cycles Symposium (sCO2), 2018
Training Structured Prediction Energy Networks with Indirect Supervision Amirmohammad Rooshenas, Aishwarya Kamath, Andrew McCallum. In Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT NAACL) 2018.
Go for a Walk and Arrive at the Answer: Reasoning Over Knowledge Bases with Reinforcement Learning. Rajarshi Das*, Shehzaad Dhuliawala*, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola and Andrew McCallum. International Conference on Learning Representations (ICLR) 2018.
Simultaneously Self-attending to All Mentions for Full-Abstract Biological Relation Extraction. Patrick Verga, Emma Strubell and Andrew McCallum. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) 2018.
Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection. Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL HLT) 2018.

2017

Go for a Walk and Arrive at the Answer: Reasoning Over Knowledge Bases with Reinforcement Learning. (Workshop Version, see also ICLR 2018 conference paper.) Rajarshi Das*, Shehzaad Dhuliawala*, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola and Andrew McCallum. Neural Information Processing Systems Workshop on Automated Knowledge Base Construction (AKBC NIPS WS, Best paper award) 2017.
Finer Grained Entity Typing with TypeNet. Shikhar Murty, Patrick Verga , Luke Vilnis, and Andrew McCallum. 6th Workshop on Automated Knowledge Base Construction (AKBC NIPS WS) 2017.
Automatically Extracting Action Graphs From Materials Science Synthesis Procedures. Sheshera Mysore, Edward Kim, Emma Strubell, Ao Liu, Haw-Shiuan Chang, Srikrishna Kompella, Kevin Huang, Andrew McCallum and Elsa Olivetti. NIPS Workshop on Machine Learning for Molecules and Materials. Spotlight talk. (NIPS WS) 2017.
Attending to All Mention Pairs for Full Abstract Biological Relation Extraction. Patrick Verga, Emma Strubell, Ofer Shai, and Andrew McCallum. 6th Workshop on Automated Knowledge Base Construction (AKBC NIPS WS) 2017.
Materials synthesis insights from scientific literature via text extraction and machine learning. Edward Kim, Kevin Huang, Adam Saunders, Andrew McCallum, Gerbrand Ceder, Elsa Olivetti. Chemistry of Materials 29 (21), 9436-9444. 2017.
Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples. Haw-Shiuan Chang, Erik Learned-Miller, Andrew McCallum. Neural Information Processing Conference (NIPS) 2017.
Improved Representation Learning for Predicting Commonsense Ontologies. Xiang Lorraine Li, Luke Vilnis, Andrew McCallum. International Conference on Machine Learning Workshop on Deep Structured Prediction (ICML WS) 2017.
Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling. Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, Andrew McCallum. International Conference on Machine Learning Workshop on Deep Structured Prediction (ICML WS) 2017.
Unsupervised Hypernym Detection by Distributional Inclusion Vector Embedding . Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum. ArXiv preprint (ArXiv) 2017.

RelNet: End-to-end Modeling of Entities & Relations. Trapit Bansal, Arvind Neelakantan, Andrew McCallum. NIPS Workshop on Automated Knowledge Base Construction (NIPS AKBC WS) 2017.
Dependency Parsing with Dilated Iterated Graph CNNs. Emma Strubell, Andrew McCallum. 2nd Workshop on Structured Prediction for Natural Language Processing (EMNLP WS) 2017.
An Online Hierarchical Algorithm for Extreme Clustering. Ari Kobren, Nicholas Monath, Akshay Krishnamurthy, Andrew McCallum. Proceedings of Knowledge Discovery and Data Mining, oral presentation (KDD oral) 2017.
Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks. Rajarshi Das, Manzil Zaheer, Siva Reddy, Andrew McCallum. Association of Computational Linguistics, short paper (ACL short) 2017.
Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions. Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP) 2017.
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum. (SemEval) 2017.
End-to-End Learning for Structured Prediction Energy Networks. David Belanger, Bishan Yang, Andrew McCallum. International Conference on Machine Learning (ICML) 2017.
Learning a Natural Language Interface with Neural Programmer. Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei. Submitted to the International Conference on Learning Representations (ICLR), 2017.
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks. Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum. European Association of Computational Linguistics (EACL), 2017.
Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema. Patrick Verga, Arvind Neelakantan, Andrew McCallum. European Association of Computational Linguistics (EACL), 2017.

2016

Structured Prediction Energy Networks. David Belanger and Andrew McCallum. International Conference on Machine Learning (ICML), 2016.
Multilingual Relation Extraction using Compositional Universal Schema. Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew McCallum. North American Association of Computational Linguistics (NAACL), 2016.
Ask the GRU: Multi-task Learning for Deep Text Recommendations. Trapit Bansal, David Belanger, Andrew McCallum. Recommender Systems (RecSys), 2016.
Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks. Teresa Martin and Fiete Botschen and Ajay Nagesh and Andrew McCallum. NAACL 2016 Workshop on Automated Knowledge Base Construction (AKBC), 2016.
Incorporating Selectional Preferences in Multi-hop Relation Extraction. Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum. NAACL 2016 Workshop on Automated Knowledge Base Construction (AKBC), 2016.
Row-less Universal Schema. Patrick Verga and Andrew McCallum. NAACL Workshop on Automated Knowledge Base Construction (AKBC), 2016.
Extracting Multilingual Relations under Limited Resources: TAC 2016 Cold-Start KB construction and Slot-Filling using Compositional Universal Schema. Haw-Shiuan Chang, Abdurrahman Munir, Ao Liu, Johnny Tian-ZhengWei, Aaron Traylor, Ajay Nagesh, Nicholas Monath, Patrick Verga, Emma Strubell and Andrew McCallum. Text Analysis Conferenc, Knowledge Base Population (TAC/KBP), 2016.

2015

Structured Prediction Energy Networks. David Belanger, Andrew McCallum. ArXiv pre-print, submitted to ICLR and rejected, 2015.
Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches Evgeniy Gabrilovich, Ramanathan Guha, Andrew McCallum, Kevin Murphy. AAAI Spring Symposium Series Technical Report, 2015.
Multilingual Relation Extraction using Compositional Universal Schema. Pat Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew McCallum. ArXiv pre-print, submitted to ICLR, 2016.
Word Representations via Gaussian Embedding. Luke Vilnis, Andrew McCallum. International Conference on Learning Representations (ICLR) oral presentation, 2015.
Compositional Vector Space Models for Knowledge Base Inference. Arvind Neelakantan, Benjamin Roth, Andrew McCallum. AAAI Spring Symposium Series (AAAI-SS), 2015.
Bethe Projections for Non-Local Inference. Luke Vilnis, David Belanger, Dan Sheldon, Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI) 2015.
Learning Dynamic Feature Selection for Fast Sequential Prediction. Emma Strubell, Luke Vilnis, Kate Silverstein and Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015. Outstanding paper award.
Compositional Vector Space Models for Knowledge Base Completion. Arvind Neelakantan, Benjamin Roth and Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015.

2014

Training for Fast Sequential Prediction Using Dynamic Feature Selection. Emma Strubell, Luke Vilnis, and Andrew McCallum. NIPS Workshop on Modern Machine Learning and NLP (NIPS WS). Montreal, Quebec, Canada. December 2014.
Knowledge Base Completion using Compositional Vector Space Models. Arvind Neelakantan, Benjamin Roth and Andrew McCallum. In 4th Workshop on Automated Knowledge Base Construction (AKBC) 2014 at NIPS. Outstanding Paper Award.
Minimally Supervised Event Argument Extraction using Universal Schema. Benjamin Roth, Emma Strubell, Katherine Silverstein and Andrew McCallum. In 4th Workshop on Automated Knowledge Base Construction (AKBC) at NIPS, Montreal, Quebec, Canada. December 2014.
Universal Schema for Slot-Filling, Cold-Start KBP and Event Argument Extraction: UMass IESL at TAC KBP 2014. Benjamin Roth, Emma Strubell, John Sullivan, Lakshmi Vikraman, Katherine Silverstein, and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '14 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2014.
Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. Arvind Neelakantan, Jeevan Shankar, Alexandre Passos and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP), 2014.
A Hierarchical Model for Universal Schema Relation Extraction. Arvind Neelakantan, Alexandre Passos, Andrew McCallum. Workshop on Automatic Creation and Curation of Knowledge Bases (WACCK) at SIGMOD, 2014.
Message Passing for Soft Constraint Dual Decomposition. David Belanger, Alexandre Passos, Sebastian Riedel, Andrew McCallum. Uncertainty in Artificial Intelligence (UAI), 2014.
Lexicon Infused Phrase Embeddings for Named Entity Resolution. Alexandre Passos, Vineet Kumar, Andrew McCallum. Conference on Computational Natural Language Learning (CoNLL), 2014.
Learning Soft Linear Constraints with Application to Citation Field Extraction. Sam Anzaroot, Alexandre Passos, David Belanger, Andrew McCallum. Proceedings of the Association for Computational Linguistics (ACL), 2014.

2013

Optimization and Learning in FACTORIE. Alexandre Passos, Luke Vilnis, Andrew McCallum. Neural Information Processing Systems Workshop on Optimization for Machine Learning (NIPS WS), 2013.
Marginal Inference in MRFs using Frank-Wolfe. David Belanger, Dan Sheldon, Andrew McCallum. Neural Information Processing Systems Workshop on Greedy Optimization, Frank-Wolfe and Friends (NIPS WS), 2013.
Anytime Belief Propagation Using Sparse Domains. Sameer Singh, Sebastian Riedel, Andrew McCallum. Neural Information Processing Systems Workshop on Resource-Efficient Machine Learning (NIPS WS), 2013.
Universal Schema for Slot Filling and Cold Start: UMass IESL at TACKBP. Sameer Singh, David Belanger, Ari Kobren, Michael Wick, Alexandre Passos, Harshal Pandya, Jinho Choi, Brian Martin, Andrew McCallum. Text Analysis Conference (TAC), 2013.
Universal Schema for Entity Type Prediction. Limin Yao, Sebastian Reidel, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013.
A Joint Model for Discovering and Linking Entities. Michael Wick, Sameer Singh, Harshal Pandya, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013.
Assessing Confidence of Knowledge Base Content with an Experimental Study in Entity Resolution. Michael Wick, Sameer Singh, Ari Kobren, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013.
Joint Inference of Entities, Relations, and Coreference. Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013.
Dynamic Knowledge Base Alignment for Coreference Resolution. Jiaping Zheng, Luke Vilnis, Sameer Singh, Jinho Choi, Andrew McCallum. Seventeenth Conference on Computational Natural Language Learning (CoNLL), 2013.
Transition-based Dependency Parsing with Selectional Branching. Jinho D. Choi, Andrew McCallum, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 2013.
Open Scholarship and Peer Review: a Time for Experimentation. David Soergel, Adam Saunders, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
A New Dataset for Fine-Grained Citation Field Extraction. Sam Anzaroot, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
Large-scale Author Coreference via Hierarchical Entity Representations. Michael L Wick, Ari Kobren, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013.
Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia. Sameer Singh, Amar Subramanya, Fernando Pereira, Andrew McCallum. Technical Report (TR) UMASS-CS-2012-015, October, 2012.
Relation Extraction with Matrix Factorization and Universal Schemas. Sebastian Riedel, Limin Yao, Benjamin M. Marlin and Andrew McCallum, Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2013.
Latent Relation Representations for Universal Schemas. Sebastian Riedel, Limin Yao, Andrew McCallum. International Conference on Learning Representations (ICLR), 2013.

2012

MAP Inference in Chains using Column Generation. David Bellanger, Alexandre Passos, Sebastian Riedel, Andrew McCallum. Proceedings of Neural Information Processing (NIPS), 2012.
Probabilistic Databases of Universal Schema. Limin Yao, Sebastian Riedel and Andrew McCallum, NAACL Workshop on Automatic Knowledge Base Construction (AKBC), 2012.
Human Machine Cooperation with Epistemological DBs: Supporting User Corrections to Automatically Constructed KBs. Michael Wick, Karl Schultz, and Andrew McCallum. NAACL Workshop on Automatic Knowledge Base Construction (AKBC) 2012. (Best paper runner-up)
Monte Carlo MCMC: Efficient Inference by Sampling Factors. Sameer Singh, Michael Wick, and Andrew McCallum. NAACL Workshop on Automatic Knowledge Base Construction (AKBC) 2012.
Monte Carlo MCMC: Efficient Inference by Approximate Sampling. Sameer Singh, Michael Wick, Andrew McCallum. Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP), 2012.
Combining joint models for biomedical event extraction. David McClosky, Sebastian Riedel, Minhai Surdeanu, Andrew McCallum, Christopher Manning. BMC Bioinformatics, 2012.
Speeding up MAP with Column Generation and Block Regularization. David Belanger, Alexandre Passos, Sebastian Riedel and Andrew McCallum, ICML Workshop on Inferning: Interactions between Inference and Learning, (ICML WS), 2012.
Parse, Price and Cut - Delayed Column and Row Generation for Graph Based Parsers. Sebastian Riedel, David A. Smith and Andrew McCallum, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.
A Discriminative Hierarchical Model for Fast Coreference at Large Scale. Michael Wick, Sameer Singh, Andrew McCallum. Association for Computational Linguistics (ACL), 2012.
Unsupervised Relation Discovery with Sense Disambiguation. Limin Yao, Sebastian Riedel and Andrew McCallum. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), 2012.
Topic Models for Taxonomies. Anton Bakalov, Andrew McCallum, Hanna Wallach and David Mimno. Proceedings of the Joint Conference on Digital Libraries (JCDL), 2012.
Selecting Actions for Resource-bounded Information Extraction using Reinforcement Learning. Pallika Kanani, Andrew McCallum. Web Search and Data Mining (WSDM), 2012.

2011

Correlations and anticorrelations in LDA inference. Alexandre Passos, Hanna Wallach, Andrew McCallum. Neural Information Processing Systems Workshop on Challenges in Learning Hierarchical Models: Transfer Learning and Optimization (NIPS WS), 2011.
Inducing Value Sparsity for Parallel Inference in Tree-shaped Models. Sameer Singh, Brian Martin, Andrew McCallum. Neural Information Processing Systems Workshop on Computational Trade-offs in Statistical Learning (NIPS WS), 2011.
Towards Asynchronous Distributed MCMC Inference for Large Graphical Models. Sameer Singh, Andrew McCallum. Neural Information Processing Systems Workshop on Algorithms, Systems, and Tools for Learning at Scale (NIPS WS), 2011.
Query Aware McMC. Michael Wick and Andrew McCallum. Proceedings of Neural Information Processing Systems (NIPS), 2011.
Toward Interactive Training and Evaluation. Greg Druck and Andrew McCallum. Conference on Information and Knowledge Mangement (CIKM), 2011.
Model Combination for Event Extraction in BioNLP. Sebastian Riedel, David McClosky, Mihai Surdeanu, Christopher D. Manning and Andrew McCallum. Proceedings of the Natural Language Processing in Biomedicine NAACL 2011 Workshop (BioNLP), 2011.
Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation. Sebastian Riedel and Andrew McCallum. Proceedings of the Natural Language Processing in Biomedicine NAACL 2011 Workshop (BioNLP), 2011.
Inter-Event Dependencies support Event Extraction from Biomedical Literature. Roman Klinger, Sebastian Riedel and Andrew McCallum. Mining Complex Entities from Network and Biomedical Data (MIND), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2011.
Structured Relation Discovery using Generative Models. Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011.
Fast and Robust Joint Models for Biomedical Event Extraction. Sebastian Riedel, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011.
Optimizing Semantic Coherence in Topic Models. David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011.
SampleRank: Training Factor Graphs with Atomic Gradients. Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew McCallum. Proceedings of the International Conference on Machine Learning (ICML), 2011.
Database of NIH grants using machine-learned categories and graphical clustering. Edmund M Talley, David Newman, David Mimno, Bruce W Herr II, Hanna M Wallach, Gully Burns, Miriam Leenders, Andrew McCallum. Nature Methods, 8, 443–444, 27 May 2011.
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models. Sameer Singh, Amarnag Subramanya, Fernando Pereira, Andrew McCallum. Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011

2010

An Introduction to Conditional Random Fields. Charles Sutton, Andrew McCallum. Foundations and Trends in Machine Learning (FnT ML), to appear.
Distantly labeling data for large scale cross-document coreference. Sameer Singh, Michael Wick, Andrew McCallum. Technical report on arXiv (TR), 2010.
Distributed MAP Inference for Undirected Graphical Models. Sameer Singh, Amarnag Subramanya, Fernando Pereira, Andrew McCallum. Neural Information Processing Systems Workshop on Learning on Cores, Clusters, and Clouds (NIPS WS), 2010.
Machine Translation Using Overlapping Alignments and SampleRank. Benjamin Roth, Andrew McCallum, Marc Dymetman and Nicola Cancedda. Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA), 2010.
High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models. Gregory Druck, Andrew McCallum. International Conference on Machine Learning (ICML), 2010.
Constraint-Driven Rank-Based Learning for Information Extraction Sameer Singh, Limin Yao, Sebastian Riedel, Andrew McCallum. Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT),
Collective Cross-Document Relation Extraction Without Labelled Data. Limin Yao, Sebastian Riedel, Andrew McCallum. Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2010.
Modeling Relations and Their Mentions without Labeled Text. Sebastian Riedel, Limin Yao, Andrew McCallum. Proceedings of the European Conference on Machine Learning (ECML/PKDD), 2010.
Resource-bounded Information Extraction: Acquiring Missing Feature Values On Demand. Pallika H. Kanani, Andrew McCallum, Shaohan Hu. Proceedings of the 14th PA Conference on Knowledge Discovery and Data Mining (PAKDD), 2010. (Best paper runner-up.)
Scalable Probabilistic Databases with Factor Graphs and MCMC. Michael Wick, Andrew McCallum, Gerome Miklau. Proceedings of the International Conference on Very Large Databases (VLDB), 2010.

2009

FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs. Andrew McCallum, Karl Schultz, Sameer Singh. Neural Information Processing Systems (NIPS), 2009.
Rethinking LDA: Why Priors Matter. Hanna Wallach, David Mimno, Andrew McCallum. Neural Information Processing Systems (NIPS), 2009.
Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference.. Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum. Neural Information Processing Systems (NIPS), 2009.
SampleRank: Learning Preferences from Atomic Gradients. Michael Wick, Khashayar Rohanimanesh, Aron Culotta, Andrew McCallum. Neural Information Processing Systems Workshop on Advances in Ranking (NIPS WS), 2009.
Bi-directional Joint Inference for Entity Resolution and Segmentation using Imperatively-Defined Factor Graphs. Sameer Singh, Karl Schultz, Andrew McCallum. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2009.
Efficient Methods for Topic Model Inference on Streaming Document Collections. Limin Yao, David Mimno and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2009, Paris, France.
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment. Kedar Bellare and Andrew McCallum. Proceedings of Empirical Methods in Natural Language Processing (EMNLP) 2009, Singapore (EMNLP), 2009
Polylingual Topic Models. David Mimno, Hanna Wallach, Jason Naradowsky, David Smith and Andrew McCallum. Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Singapore, 2009.
Active Learning by Labeling Features. Gregory Druck, Burr Settles, Andrew McCallum. Proceedings of the Empirical Methods in Natural Language Processing (EMNLP).
Inference and Learning in Large Factor Graphs with Adaptive Proposal Distributions. Khashayar Rohanimanesh, Michael Wick, Andrew McCallum. University of Massachusetts Technical Report #UM-CS-2009-008 (TR), 2009
Advances in Learning and Inference for Partition-wise Models of Coreference Resolution. Michael Wick and Andrew McCallum. University of Massachusets Technical Report # UM-CS-2009-028 (TR), 2009
Representing Uncertainty in Databases with Scalable Factor Graphs. Michael Wick, Masters Thesis/Synthesis. Readers: Andrew McCallum and Gerome Miklau. April 2009
An Entity Based Model for Coreference Resolution. Michael Wick, Aron Culotta, Khashayar Rohanimanesh, Andrew McCallum. Proceedings of the SIAM International Conference on Data Mining (SDM), Reno, Nevada, 2009
Alternating Projections for Learning with Expectation Constraints. Kedar Bellare, Gregory Druck and Andrew McCallum. Uncertainty in Artificial Intelligence (UAI), 2009
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria. Gregory Druck, Gideon Mann, Andrew McCallum. Proceedings of the Association for Computational Linguistics (ACL).
Towards Theoretical Bounds for Resource-bounded Information Gathering for Correlation Clustering. Pallika Kanani, Andrew McCallum, Ramesh Sitaraman. UMass TechReport UM-CS-2009-027 (TR), 2009.
Generalized Expectation Criteria with application to Semi-Supervised Classification and Sequence Modeling. Gideon Mann and Andrew McCallum. Journal of Machine Learning Research (JMLR). To appear.

2008

Reinforcement Learning for MAP Inference in Large Factor Graphs. Khashayar Rohanimanesh, Michael Wick, Sameer Singh, and Andrew McCallum. University of Massachusetts Technical Report #UM-CS-2008-040 (TR), 2008
Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors. David Mimno, Hanna Wallach and Andrew McCallum. NIPS Workshop on Analyzing Graphs, (NIPS WS), 2008, Whistler, BC.
FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning. Andrew McCallum, Khashayar Rohanemanesh, Michael Wick, Karl Schultz, Sameer Singh. NIPS Workshop on Probabilistic Programming, (NIPS WS), 2008. (Discriminatively trained undirected graphical models, or conditional random fields, have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. Although there has been much historic interest in the combination of logic and probability, we argue that in this mixture 'logic' is largely a red herring. The power in relational models is in their repeated structure and tied parameters; and logic is not necessarily the best way to define these structures. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an object-oriented imperative language to express various aspects of model structure, inference and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented our ideas in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional JVM language named Scala.)
A Discriminative Approach to Ontology Alignment. Michael Wick, Khashayar Rohanimanesh, Andrew McCallum, and AnHai Doan. In the International Workshop on New Trends in Information Integration (NTII) at the conference for Very Large Databases (VLDB WS), Auckland, New Zealand, 2008. (New state-of-the-art results on ontology alignment using graph-shaped conditional random fields, joint inference, and parameter estimation by Rank-Based Training.)
A Unified Approach for Schema Matching, Coreference, and Canonicalization. Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum. In Conference on Knowledge Discovery and Data Mining (KDD). 2008. (Information integration, performing joint inference over schema matching, entity resolution and canonicalization, using conditional random fields, features encoding clauses in first-order logic, and efficient inference by Metropolis-Hastings. Positive experimental results on multiple data sets.)
Unsupervised Deduplication using Cross-field Dependencies. Robert Hall, Charles Sutton, Andrew McCallum. In Conference on Knowledge Discovery and Data Mining (KDD). 2008. (Hierarchical Dirichlet process model that jointly clusters citation venue strings based on both string-edit distance and title information.)
Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors. Hanna Wallach, Charles Sutton, Andrew McCallum. In International Conference on Machine Learning, Workshop on Prior Knowledge for Text and Language Processing. (ICML WS), 2008. (Two Bayesian dependency parsing models: 1. Model with Pitman-Yor prior that significantly improves Eisner's classic model; 2. Latent-variable model that learns "syntactic" topics.)
Learning from Labeled Features using Generalized Expectation Criteria. Gregory Druck, Gideon Mann and Andrew McCallum. Proceedings of ACM Special Interest Group on Information Retreival, (SIGIR), 2008. (Learn classifiers by labeling features rather than instances. Extensive evaluation on many text data sets, showing substantial improvement over other methods of semi-supervised learning.)
Learning to Predict the Quality of Contributions to Wikipedia. Gregory Druck, Gerome Miklau and Andrew McCallum. AAAI Workshop on Wikipedia and AI, (AAAI WS), 2008. (Predict the longevity of an edit to Wikipedia, using textual features of the edit as well as features of the editor. Could be part of a tool to prioritize verification of changes to Wikipedia.)
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression. David Mimno and Andrew McCallum. (Plenary presentation.) Conference on Uncertainty in Artificial Intelligence, (UAI), 2008. (Text documents are usually accompanied by metadata, such as the authors, the publication venue, the date, and any references. Work in topic modeling that has taken such information into account, such as Author-Topic, Citation-Topic, and Topic-over-Time models, has generally focused on constructing specific models that are suited only for one particular type of metadata. This paper presents a simple, unified model for learning topics from documents given arbitrary non-textual features, which can be discrete, categorical, or continuous.)
Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields. Gideon Mann and Andrew McCallum. Proceedings of Association of Computational Linguistics, (ACL), 2008. (Generalized expectation for semi-supervised learning of linear-chain conditional random fields.)
Piecewise Training for Structured Prediction. Charles Sutton and Andrew McCallum. Accepted to the Machine Learning Journal, (MLJ), 2008. (Efficiently train CRFs in parts. It works well even though full joint inference is used at test time.)
Pachinko Allocation: Scalable Mixture Models of Topic Correlations. Wei Li and Andrew McCallum. Submitted to the Journal of Machine Learning Research, (JMLR), 2008. (The pachinko allocation model represents nested correlations among topics using a DAG. This paper has work is in efficiently fitting these models, (as well as plain old LDA) by creating and leveraging sparsity in the distribution over topics to be sampled for each document.)

2007

Unsupervised Coreference of Publication Venues . Robert Hall, Charles Sutton and Andrew McCallum. University of Massachusetts Amherst Technical Report, (TR), 2007. (A generative non-parametric mixture model for entity resolution of publication venues that leverages both the venue titles as well as distributions over words in paper titles.)
Generalized Expectation Criteria. Andrew McCallum, Gideon Mann and Gregory Druck. University of Massachusetts Amherst Technical Report #2007-60, (TR), 2007. (This note introduces and motivates Generalized Expectation (GE) criteria. GE criteria are terms in a parameter-estimation objective function that express preferences about model expectations. In certain simple cases, GE falls into the same equivalence class as moment matching, maximum likelihood and maximum entropy estimation. However, our work focusses on leveraging GE's special flexibility in three non-traditional ways: (1) GE criteria can be specified indepently of the model parameterization. In factor graphs, we break the traditional one-to-one mapping between (a) subsets of variables participating in parametered model factors and (b) subsets of variables over which the objective function's expectations are calculated. (2) Within the same objective function, multiple GE terms that are conditional expectations can be conditioned on multiple different data sets. This is useful for semi-supervised learning and transfer learning. (3) A target expectation (or more generally the expectation preference function can come from any source, including other tasks or human domain knowledge. GE is the successor to Expectation Regularization, which is described in our ICML 2007 paper below.)
Reducing Annotation Effort using Generalized Expectation Criteria--DRAFT. Gregory Druck, Gideon Mann and Andrew McCallum. University of Massachusetts Amherst Technical Report #2007-62, (TR), 2007. (A version of Generalized Expectation (GE) in which the supervision is provided by labeling features instead of instances. Dramatically faster wall-clock labeling to acheive high accuracy. Experiments on document classification.)
Community-based Link Prediction with Text. David Mimno, Hanna M. Wallach and Andrew McCallum. In Proceedings of the NIPS 2007 Workshop on Statistical Network Modeling (NIPS WS), 2007. (New state-of-the-art results in link-prediction using a latent-variable topic model, in which "community" variables are associated with topic distributions and author distributions. Thus the model combines the use of language/topics and co-authorships to discover communities.)
Leveraging Existing Resources using Generalized Expectation Criteria. Gregory Druck, Gideon Mann and Andrew McCallum. NIPS Workshop on Learning Problem Design, (NIPS WS), 2007. (Generalized Expectation applied in situations in which there is no labeled data. All supervision is obtained form existing auxiliary resources such as lexicons. Experiments on information extraction.)
Lightly-Supervised Attribute Extraction for Web Search. Kedar Bellare, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum and Mark Dredze. NIPS Workshop on Machine Learning for Web Search, (NIPS WS), 2007. (Extract a large number of attributes of different entities from natural language text. Methods based on co-training and maximum entropy classifiers.)
People-LDA: Anchoring Topics to People Using Face Recognition. Vidit Jain, Erik Learned-Miller, and Andrew McCallum. International Conference on Computer Vision (ICCV), 2007. (Jointly model people's identity, face appearance in an image, and surrounding text in the image captions with an LDA-style topic model. Improved results in identifying coherent sets of person "mentions"---that is, improved co-reference by using both text and image features.)
Joint Group and Topic Discovery from Relations and Text. Andrew McCallum, Xuerui Wang and Natasha Mohanty, Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science 4503, pp. 28-44, (Book chapter), 2007. (Book chapter version of NIPS 2006 conference paper. Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension in forms the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. Xuerui Wang, Andrew McCallum and Xing Wei, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM), 2007. (A topic model in the LDA style that uses a Markov model to automatically discover topically-relevant arbitrary-length phrases, not just lists of single words. The phrase discovery is not simply a post-processing step, but an intrinsic part of the model that helps it discover better topics. Experiments on document retrieval tasks.)
Canonicalization of Database Records using Adaptive Similarity Measures. Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Defines and explores the problem of "canonicalization"---selecting the best field values for a single, standard record formed from a set of consolodated, co-resolved information sources, such as arise from merging databases, or combining multiple sources of information extraction.)
Generalized Component Analysis for Text with Heterogeneous Attributes. Xuerui Wang, Chris Pal and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (A topic model based on an undirected graphical model, which makes it easier to incorporate multiple modalities.)
Semi-Supervised Classification with Hybrid Generative/Discriminative Methods. Greg Druck, Chris Pal, Xiaojin Zhu and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Leverage unlabeled data for text classification by using an objective function that combines (1) joint probability of labels and words and (2) conditional probability of labels give words.)
Expertise Modeling for Matching Papers with Reviewers. David Mimno and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (The Author-Persona-Topic model is a LDA-style topic model especially designed to represent expertise as a mixture of topical intersections. We show positive results in matching reviewers to conference papers, as assessed by human judgements.)
Learning Extractors from Unlabeled Text using Relevant Databases. Kedar Bellare and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Use conditional random fields to learn information extractors both from DB fields and from alignments of DB in free text. Uses an Alignment CRF, similar to our UAI 2005 paper.)
Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes. Pallika Kanani and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Improve entity resolution by adding web pages as new "mentions" to the graph-partitioning problem, and do so efficiently by selecting a subset of the possible queries and a subset of the returned pages.)
Probabilistic Representations for Integrating Unreliable Data Sources. David Mimno and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Probabilistic representation of field values used in merging and augmenting information from DBPL and research paper PDFs.)
Author Disambiguation using Error-Driven Machine Learning With a Ranking Loss Function. Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Entity resolution of people using high-order features, made efficient with Metropolis-Hastings and SampleRank, a learning method based ranking.)
Nonparametric Bayes Pachinko Allocation. Wei Li, David Blei and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2007. (A version of pachinko allocation that automatically determines the number of topics (and super-topics), and its sparse connectivity structure by Dirichlet process priors. Positive results in redisovering known structure in synthetic data, and in held-out likelihood versus PAM, hLDA and HDP.)
Improved Dynamic Schedules for Belief Propagation. Charles Sutton and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2007. (Significantly faster inference in graphical models by selecting which BP messages to send based on an approximation to their residual.)
Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization. Gideon Mann and Andrew McCallum. International Conference on Machine Learning (ICML), 2007. (Semi-supervised learning is seldom used in real applications because it is often complicated to implement, fragile in tuning or inefficient for large data. We introduce a new highly usable approach to semi-supervised learning, augmenting traditional label log-likelihood with an additional term that encourages model predictions on unlabeled data to match certain expectations. Positive results on 5 data sets versus EM, transductive SVM, entropy regularization and a graph-based method.)
Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. Charles Sutton and Andrew McCallum. ICML, 2007. (Train a large CRF in five times faster by dividing it into separate pieces and reducing numbers of predicted variable combinations with pseudolikelihood. Analysis in terms of belief propagation and Bethe energy.)
Mixtures of Hierarchical Topics with Pachinko Allocation. David Mimno, Wei Li and Andrew McCallum. ICML, 2007. (From a large document collection automatically discover topic hierarchies, where documents may be flexibly represented as mixtures across multiple leaves, not just mixtures up and down a single leaf-root path. Thus, for example, we can represent a document about instructing a robot in natural language, where those two topics are leaves. This new model, hPAM, combines the best of pachinko allocation (PAM) and hierarchical LDA (hLDA). Dramatic improvements in held-out data likelihood and mutual information between discovered topics and human-assigned categories.)
Transfer Learning for Enhancing Information Flow in Organizations and Social Networks. Chris Pal, Xuerui Wang and Andrew McCallum. Submitted to Conference on Email and Spam (CEAS), 2007. Technical Note. (Continuous hidden varable conditional random field for CC prediction/suggestion in email.)
Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email. Andrew McCallum, Xuerui Wang and Andres Corrada-Emmanuel. Journal of Artificial Intelligence Research (JAIR), 2007. (Journal paper version of IJCAI conference paper on Author-Recipient-Topic (ART) model.)
Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. Gideon Mann and Andrew McCallum. NAACL/HLT, (short paper) 2007. (A new, faster dynamic program for calculating the entropy of a finite-state subsequence and its gradient.)
First-Order Probabilistic Models for Coreference Resolution. Aron Culotta, Michael Wick, Robert Hall and Andrew McCallum. NAACL/HLT, 2007. (Traditional coreference uses features only over pairs of mentions. Here we present a conditional random field with first-order logic for expressing features, enabling features over sets of mentions. The result is a new state-of-the-art results on ACE 2004 coref, jumping from 69 to 79---a 45% reduction in error. The advance depends crucially on a new method of parameter estimation for such "weighted logic" models based on learning rankings and error-driven training.)
Sparse Message Passing Algorithms for Weighted Maximum Satisfiability. Aron Culotta, Andrew McCallum, Bart Selman, Ashish Sabharwal. New England Student Symposium on Artificial Intelligence (NESCAI), 2007. (A new algorithm for solving weighted maximum satisfiability (WMAX-SAT) problems that divides a large problem into sub-problems, and coordinates the global solution by message passing with sparse messages. Inspired by the desire to do joint-inference in (a) large weighted logics ala Markov Logic Networks, (b) large NLP pipelines, in which there are efficient pre-existing (dynamic programming) solutions to sub-parts of the pipeline. Positive results versus WalkSAT!)
Cryptogram Decoding for OCR using Numerzation Strings. Gary Huang, Erik Learned-Miller and Andrew McCallum. ICDAR, 2007. (Robust OCR without font appearance models by incorporating language modeling.)
Penn/UMass/CHOP BiocreativeII Systems. Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew McCallum, Steven Carroll, Yang Jin, and Peter White. BiocreativeII Evaluation Workshop. 2007. (Description of our high-ranking entry in the competition for extraction and linkage from bioinformatics text.
Resource-bounded Information Gathering for Correlation Clustering. Pallika Kanai and Andrew McCallum. Conference on Computational Learning Theory (COLT) Open Problems Track, 2007. (We present a new class of problems in which the goal is to perform correlational clustering under circumstances in which accuracy can be improved by augmenting the given graph with additional information.)
Organizing the OCA: Learning faceted subjects from a library of digital books. David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL), 2007. (Introduces the DCM-LDA topic model, which represents topics by a Dirichlet-compound-multinomial rather than a multinomial. In addition to obtaining interesting information about the difference varianes of the topics, this model lends itself to efficient parallelization with very coarse-grained synchronization. The result is a topic model that can run on over 1 billion words in just a few hours.)
Mining a digital library for influential authors. David Mimno and Andrew McCallum. Joint Conference on Digial Libraries (JCDL), 2007. (A probabilistic model that ranks authors based on their influence on particular areas of scientific research. Integrates topics with citation patterns.)
Improving Author Coreference by Resource-bounded Information Gathering from the Web. Pallika Kanani, Andrew McCallum and Chris Pal. International Joint Conference on Artificial Intelligence (IJCAI), 2007. (Sometimes there is simply insufficient information to make an accurate entity resolution decision, and we must gather additional evidence. This paper describes the use of web queries to improve research paper author coreference, exploring two methods of augmenting a graph partitioning problem: using the web to obtain new features on existing edges, and use the web to obtain new nodes in the graph. We then go on to describe decision-theoretic approaches for maximizing accuracy gain with a limited budget of web queries, and demonstrate our methods on three large data sets.)
Dynamic Conditional Random Fields. Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh. Journal of Machine Learning Research (JMLR), Vol. 8(Mar), pages 693-723, 2007. (Journal paper version of ICML paper by the same authors, with new experiments on marginal likelihood training.)

2006

On Discriminative and Semi-Supervised Dimensionality Reduction. Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck and Andrew McCallum. Advances in Neural Information Processing Systems, Workshop on Novel Applications of Dimensionality Reduction, (NIPS Workshop), 2006. (Using Multi-Conditional Learning, learn to distribute mixture components just were needed to address some discriminative task. See compelling figure on synthetic overlapping spiral data.)
Learning Field Compatibilities to Extract Database Records from Unstructured Text. Michael Wick, Aron Culotta and Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2006. (Record extraction, jointly accounting for multi-field compatibility by content and layout features.)
Tractable Learning and Inference with Higher-Order Representations. Aron Culotta and Andrew McCallum. ICML Workshop on Open Problems in Statistical Relational Learning, 2006. (When working with CRFs having features based on first-order logic, the "unrolled" graphical model would be far to large to fully instantiate. This paper describes a method leveraging MCMC to perform inference and learning while only partially instantiating the model. Positive results on entity resolution (of research papr authors) are described.)
Corrective Feedback and Persistent Learning for Information Extraction. Aron Culota, Trausti Kristjansson, Andrew McCallum, Paul Viola. Artificial Intelligence Journal (AIJ), volume 170, pages 1101-1122, 2006. (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction. Journal paper version of AAAI paper by the same authors below. Adds experiments with active learning.)
CC Prediction with Graphical Models. Chris Pal and Andrew McCallum. Conference on Email and Anti-Spam (CEAS), 2006. (Help keep an organization coordinated by suggesting who to carbon-copy on your outgoing email message.)
Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta, Andrew McCallum. HLT Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, 2006. (Markov Logic Networks are Conditional Random Fields that use first-order logic to define features and parameter tying patterns. Making such models scale to non-trivial data set sizes is a challenge because the size of the full instantiation of the model is exponential in the arity of the formulae. Here we describe a method of partial instantiation that allows such models to scale to entity resolution problems millions of entity mentions. On both citation and author entity resolution problems we show that inclusing such first-order features provides increases in accuracy.)
A Continuous-Time Model of Topic Co-occurrence Trends. Xuerui Wang, Wei Li, and Andrew McCallum. AAAI Workshop on Event Detection, 2006. (Capture the time distributions not only of a topics, but also of their co-occurrences. For example, notice that while NLP and ML have both been around for a long time, but their co-occurrence has been rising recently. The model is effectively a combination of the Pachinko Allocation Model (PAM) and Topics-Over-Time (TOT).)
Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning. Michael Kelm, Chris Pal, and Andrew McCallum. Draft accepted to the International Conference on Pattern Recognition (ICPR), 2006. (Multi-conditional learning explored in the context of computer vision.)
Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. Andrew McCallum, Chris Pal, Greg Druck, Xuerui Wang. AAAI, 2006. (Estimate parameters of an undirected graphical model not by joint likelihood, or conditional likelihood, but by a product of multiple conditional likelihoods. Can act as an improved regularizer. With latent variables, can cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models and (cross-cutting) conditional-training. Improved results on document classification, Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task.)
Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. Wei Li, and Andrew McCallum. ICML, 2006. (An LDA-style topic model that captures correlations between topics, enabling discovery of finer-grained topics. Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary, nested and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and held-out data likelihood.)
Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. Xuerui Wang and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD) 2006. (A new LDA-style topic model that models trends over time. The meaning of a topic remains fixed and reliable, but its prevalence over time is captured, and topics may thus focus in on co-occurrence patterns that are time-sensitive. Unlike other work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps. Improvements in topic saliency and the ability to predict time given words.)
Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition. Shaolei L. Feng, R. Manmatha and Andrew McCallum. IEEE International Conference on Document Image Analysis for Libraries (DIAL 06), pp. 30-37. 2006. (Mixed results on CRFs applied to handwritten word recognition.)
Reducing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. HLT-NAACL, 2006. (Train separately CRFs with different subsets of the features, then integrate them at test time---four different variations on the method. Especially make more reliable use of lexicon features and other highly-predictable but brittle features.)
Integrating Probabilistic Extraction Models and Relational Data Mining to Discover Relations and Patterns in Text. Aron Culotta, Andrew McCallum and Jonathan Betz. HLT-NAACL, 2006. (Extract relations from Wikipedia articles. Run data mining on the relational graph to obtain patterns that are predictive of relations---such as "opponent of my opponent is my ally" and "a person is likely to have the same religion as their parents." Then use feaures derived from these patterns in a second run of extraction that improves accuracy.)
Bibliometric Impact Measures Leveraging Topic Analysis. Gideon Mann, David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL) 2006. (Use a new topic model that leverages n-grams to discover interpretable, fine-grained topics in over a million research papers. Use these topic divisions as well as automated citation analysis to extend three existing bibliometric impact measures, and create three new ones: Topical Diversity, Topical Transfer, Topical Precedence.)
An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton and Andrew McCallum. Book chapter in Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. (An overview and introduction to conditional random fields for beginners and experts alike---motivation, background, mathematical foundations, linear-chain form, general-structure form, inference, parameter estimation, tips and tricks, an example application to information extraction with a skip-chain structure.)
Sparse Forward-Backward using Minimum Divergence Beams for Fast Training of Conditional Random Fields. Chris Pal, Charles Sutton, and Andrew McCallum. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006. (An alternative method for beam-search based on variational principles. Enables not only faster test-time performance of large-state-space CRFs, but this method makes beam search robust enough to be used at training time, enabling dramatically faster learning of discriminative finite-state methods for speech, IE and other applications.)
Table extraction for answer retrieval. Xing Wei, Bruce Croft and Andrew McCallum. Information Retrieval Journal (IRJ), volume 9, issue 5, pages 589-611, November 2006. (Information extraction from tables, using conditional random fields with language and layout features, with application to question answering. Journal paper version of our SIGIR 2003 paper.)
Semi-supervised Text Classification Using EM. Kamal Nigam, Andrew McCallum and Tom Mitchell. Book chapter in Chapelle, O., Zien, A., and Scholkopf, B. (Eds.) Semi-Supervised Learning. MIT Press: Boston. 2006. (Overview, description, experiments on using expectation maximization with naive Bayes text classifiers for learning from labeled and unlabeled data. A chapter in a book about various methods of semi-supervised learning.)
Group and Topic Discovery from Relations and Their Attributes. Xuerui Wang, Natasha Mohanty and Andrew McCallum. Neural Informaion Processing Systems (NIPS), 2006. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)

2005

A Note on Topical N-grams. Xuerui Wang and Andrew McCallum. University of Massachusetts Technical Report UM-CS-2005-071, 2005. (Discover topics like Latent Dirichlet Allocation, but model phrases in addition to single words on a per-topic basis. For example, in the Politics topic, "white house" has special meaning as a colocation, while in the RealEstate topic, modeling the individual words is sufficient. Our TNG model produces much cleaner, more interpretable topics.)
Pachinko allocation: A Directed Acyclic Graph for Topic Correlations. Wei Li and Andrew McCallum. NIPS Workshop on Nonparametric Bayesian Methods, 2005. (Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and classification, as well as improved held-out likelihood over CTM. See ICML 2006 paper above.)
Direct Maximization of Rank-Based Metrics for Information Retrieval. Don Metzler, W. Bruce Croft and Andrew McCallum. CIIR Technical Report IR-429, 2005.
Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, volume 3, Number 9, November 2005. (An overview of information extraction by machine learning methods, written for people not familiar with machine learning, especially CTOs and other people in business.)
Learning Clusterwise Similarity with First-order Features. Aron Culotta and Andrew McCallum. NIPS Workshop on the Theoretical Foundations of Clustering. 2005. (Discriminatively-trained graph-partitioning methods for clustering, with features over entire clusters, including existential and universal quanifiers. Efficiently instantiate these features only on demand.)
Composition of Conditional Random Fields for Transfer Learning. Charles Sutton and Andrew McCallum. Proceedings of Human Language Technologies / Emprical Methods in Natural Language Processing (HLT/EMNLP) 2005. (Improve information extraction from email data by using the output of another extractor that was trained on large quantities of newswire. Improve accuracy further by using joint inference between the two tasks---so that the final target task can actually affect the output of the intermediate task.)
Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-402. 2005. (Avoid a common under-appreciated problem: overly heavy reliance on a few discriminative features which may not be as reliably present in the testing data. Discusses four methods of separate training and combination, and presents statistically-significant improvements---including new best results on CoNLL-2000 NP Chunking.)
Fast, Piecewise Training for Discriminative Finite-state and Parsing Models. Charles Sutton and Andrew McCallum. Center for Intelligent Information Retrieval Technical Report IR-403. 2005. (Further results with "piecewise training", a method also described in a UAI'05 paper below.)
Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta and Andrew McCallum. Technical Report IR-430, University of Massachusetts, September 2005. (Use existental and universal quantifiers in Markov Logic, doing so practially and efficiently by incrementally instantiating these terms as needed. Applied to object correspondence, this model combines the expressivity of BLOG with the predictive accuracy advantages of conditional probability training. Experiments on citation matching and author disambiguation.)
Joint Deduplication of Multiple Record Types in Relational Data. Aron Culotta and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005.
(Longer Tech Report version: A Conditional Model of Deduplication for Multi-type Relational Data. Technical Report IR-443, University of Massachusetts, September 2005. (Leverage relations among multiple entity types to perform coreference collectively among all types. Uses CRF-style graph partitioning with a learned distance metric. Experimental results on joint coreference of both citations and their venues showing that accuracy on both improves.)
Collective Multi-Label Classification. Nadia Ghamrawi and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005. (Multi-label document classification with a conditional maximum entropy model that captures not only the traditional dependences between words and the class labels, but also the coocurrence dependencies between the class labels. Performs joint inference among all class labels.)
Predictive Random Fields: Latent Variable Models Fit by Multiway Conditional Probability with Applications to Document Analysis. Andrew McCallum, Xuerui Wang and Chris Pal. UMass Technical Report UM-CS-2005-053, version 2.1. 2005. (Cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models that are conditionally-trained. Improved results over Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task. This is an evolving Tech Report, which needs to be updated---in particular we are now referring to this method as "Multi-Conditional Learning" or "Multi-Conditional Mixtures".)
Group and Topic Discovery from Relations and Text. Xuerui Wang, Natasha Mohanty and Andrew McCallum. KDD Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD) 2005. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. Yu Gu, Andrew McCallum and Don Towsley. Internet Measurement Conference, 2005. (Build a density model of normal Internet traffic with Maximum Entropy and feature induction. Detect network attacks by density threshold.)
A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. Andrew McCallum, Kedar Bellare and Fernando Pereira. Conference on Uncertainty in AI (UAI), 2005. (Train a string edit distance function from both positive and negative examples of string pairs (matching and mismatching). Significantly, the model designer is free to use arbitrary, fancy features of both strings, and also very flexible edit operations. This model is an example of an increasingly popular interesting class---conditionally-trained models with latent variables. Positive results on citations, addresses and names.)
Joint Parsing and Semantic Role Labeling. Charles Sutton and Andrew McCallum. CoNLL (Shared Task), 2005. (Attempt to improve accuracy by performing joint inference over parsing and semantic role labeling---preserving uncertainty and multiple hypotheses in Dan Bikel's parser. Unfortunately the effort yielded negative results, most likely because the components needed to produce better calibrated probabilities.)
Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005. (Use finite-state CRFs to locate introns and exons in DNA sequences. Shows the advantages of CRFs' ability to straightforwardly incorporate homology evidence from protein databases.)
Semi-Supervised Sequence Modeling with Syntactic Topic Models. Wei Li and Andrew McCallum. AAAI, 2005. (Learn a low-dimensional manifold from large quantities of unlabled text data, then use components of the manifold as additional features when training a linear-chain CRF with limited labeled data. The manifold is learned using HMM-LDA [Griffiths, Steyvers, Blei, Tenenbaum 2004], an unsupervised model with special structure suitable for sequences and topics. Experimens with English part-of-speech tagging and Chinese word segmentation.)
Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.)
Topic and Role Discovery in Social Networks. Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang. IJCAI, 2005. (Conference paper version of tech report by same authors in 2004 below. Also includes new results with Role-Author-Recipient-Topic model. Discover roles by social network analysis with a Bayesian network that models both links and text messages exchanged on those links. Experiments with Enron email and academic email.)
Piecewise Training for Undirected Models. Charles Sutton and Andrew McCallum. UAI, 2005. (Efficiently train a large graphical model in separately normalized pieces, and amazingly often obtain higher accuracy than without this approximation. This paper also shows that this piecewise objective is a lower bound on the exact likelihood, and gives results with three different graphical model structures.)
Constrained Kronecker Deltas for Fast Approximate Inference and Estimation. Chris Pal, Charles Sutton, Andrew McCallum. Submitted to UAI, 2005. (Sometimes the graph of the graphical model is not large and complex, but the cardinality of the variables is large. This paper describes a new and generalized method for beam search on graphical models, showing positive experimental results for both inference and training. Experiments on NetTalk.)
Multi-Way Distributional Clustering via Pairwise Interactions. Ron Bekkerman, Ran El-Yaniv and Andrew McCallum. ICML 2005. (Distributional clustering in multiple feature dimensions or modalities at once--made efficient by a factored representation as used in graphical models, and by a combination of top-down and bottom-up clustering. Results on email clustering, and new best results on 20 Newsgroups.)
Disambiguating Web Appearances of People in a Social Network. Ron Bekkerman and Andrew McCallum. WWW Conference, 2005. (Find homepages and other Web pages mentioning particular people. Do a better job by leveraging a collection of related people.)

2004

Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs. Andrew McCallum and Charles Sutton. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-383. 2004. (Also presented at NIPS 2004 Workshop on Learning with Structured Outputs.) (Large undirected graphical models are expensive to train because they require global inference to calculate the gradient of the parameters. We describe a new method for fast training in locally-normalized pieces. Amazingly the resulting models also give higher accuracy than their globally-trained counterparts.)
Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. Ron Bekkerman, Andrew McCallum and Gary Huang. UMass CIIR Technical Report IR-418, 2004. (Extensive experiments on real-world email foldering.)
The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email. Andrew McCallum, Andres Corrada-Emmanuel, Xuerui Wang. Technical Report UM-CS-2004-096, 2004. (Also presented the NIPS'04 Workshop on " Structured Data and Representations in Probabilistic Models for Categorization") (Social network analysis that not only models links between people, but the word content of the messages exchanged between them. Discovers salient topics guided by the sender-recipient structure in data, and provides improved ability to measure role-similarity between people. A generative model in the style of Latent Dirichlet Allocation.)
Conditional Models of Identity Uncertainty with Application to Noun Coreference. Andrew McCallum and Ben Wellner. Neural Information Processing Systems (NIPS), 2004. (A model of object consolidation, based on graph partitioning with learned edge weights. Conference paper version of 2003 work in KDD Workshop on Data Cleaning.)
An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching. Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay. Conference on Uncertainty in Artificial Intelligence (UAI), 2004. (A conditionally-trained graphical model for identity uncertainty in relational domains, representing mentions, entities and their attributes. Also a first example of joint inference for extraction and identity uncertainty--coreference decisions actually integrate out uncertainty about information extraction.)
Collective Segmentation and Labeling of Distant Entities in Information Extraction. Charles Sutton and Andrew McCallum. ICML workshop on Statistical Relational Learning, 2004. (Makes the boundaries and types of distant segments inter-dependent by augmenting a linear-chain CRF with additional long, arching edges. Approximate inference by Tree-Reparameterization.)
An Exploration of Entity Models, Collective Classification and Relation Description. Hema Raghavan, James Allan and Andrew McCallum. KDD Workshop on Link Analysis and Group Detection, August 2004. (Part of a student synthesis project: includes an application of RMNs to classifying people in newswire.)
Sign Detection in Natural Images with Conditional Random Fields. Jerod Weinman, Al Hansen and Andrew McCallum. IEEE International Workshop on Machine Learning for Signal Processing, 2004. (Part of a student synthesis project: a grid-shaped CRF with inference by belief-propagation with Tree-Reparameterization.)
Extracting Social Networks and Contact Information from Email and the Web. Aron Culotta, Ron Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS) 2004. (Describes an early version of an end-to-end system that automatically populates your email address book with a large social network, including "friends-of-friends," and information about people's expertise.)
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Charles Sutton, Khashayar Rohanimanesh and Andrew McCallum. ICML 2004. (Joint inference over two traditionally-separate layers of NLP processing: POS-tagging and NP-chunking. Introduces the CRF analogue of Factorial HMMs. Compares several approximate inference procedures.)
Interactive Information Extraction with Constrained Conditional Random Fields. Trausti Kristjannson, Aron Culotta, Paul Viola and Andrew McCallum. Nineteenth National Conference on Artificial Intelligence (AAAI 2004). San Jose, CA. (Winner of Honorable Mention Award.) (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction.)
Accurate Information Extraction from Research Papers using Conditional Random Fields. Fuchun Peng and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004. (Applies CRFs to extraction from research paper headers and reference sections, to obtain current best-in-the-world accuracy. Also compares some simple regularization methods.)
Chinese Segmentation and New Word Detection using Conditional Random Fields. Fuchun Peng, Fangfang Feng, and Andrew McCallum. Proceedings of The 20th International Conference on Computational Linguistics (COLING 2004) , August 23-27, 2004, Geneva, Switzerland. (State-of-the art Chinese word segmentation with CRFs, with rich features and many lexicons; also using confidence estimation to add new words to the lexicon.)
Confidence Estimation for Information Extraction. Aron Culotta and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004, short paper. (How to provide not only an answer, but a formally-justified confidence in that answer--using contrained forward-backward..)
A Note on Semi-supervised Learning using Markov Random Fields. Wei Li and Andrew McCallum. Technical Note, February 3, 2004. (A general framework for semi-supervised learning in Conditional Random Fields, with a focus on learning the distance metric between instances. Experimental results with collective classification of documents.)

2003

Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences. Andrew McCallum, Khashayar Rohanimanesh and Charles Sutton. NIPS*2003 Workshop on Syntax, Semantics, Statistics, 2003. (Workshop version of ICML 2004 paper.)
Classification with Hybrid Generative/Conditional Models. Rajat Raina, Yirong Shen, Andrew Y. Ng, Andrew McCallum. Proceedings of Neural Information Processing Systems (NIPS), 2003. (Train some parameters generatively, some parameters conditionally.)
Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Induction. Wei Li and Andrew McCallum. ACM Transactions on Asian Language Information Processing, 2003. (How we developed a named entity recognition system for Hindi in just a few weeks.)
A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models. Andrew McCallum and David Jensen. IJCAI'03 Workshop on Learning Statistical Models from Relational Data, 2003. (Describes big-picture motivation and approach for research that performs information extraction and data mining in an integrated fashion, rather than in two separate serial steps. Lays out a major thrust of my current research over a multi-year span.)
Efficiently Inducing Features of Conditional Random Fields. Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2003. (CRFs give you the great power to include the kitchen sink worth of features. How do you decide which ones to include to avoid over-fitting and running out of memory? A formal, information-theoretic approach, with carefully-chosen approximations to make it efficient with millions of candidate features. This technique key to success in Hindi above, as well as work by Pereira's group at UPenn)
Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Andrew McCallum and Wei Li. Seventh Conference on Natural Language Learning (CoNLL), 2003. (This is the first publication about named entity extraction with CRFs.)
Table Extraction Using Conditional Random Fields. David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft. Proceedings of the ACM SIGIR, 2003. (Application of CRFs to finding tables in government reports. Uses both language and layout features.)
Object Consolidation by Graph Partitioning with a Conditionally-trained Distance Metric. Andrew McCallum and Ben Wellner. KDD Workshop on Data Cleaning, Record Linkage and Object Consolidation, 2003. (Later, improved version of workshop paper immediately below.)
Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference. Andrew McCallum and Ben Wellner. IJCAI Workshop on Information Integration on the Web, 2003. (A conditionally-trained model of object consolidation, based on graph partitioning with learned edge weights.)
Challenges in information retrieval and language modeling: report of a workshop held at the Center for Intelligent Information Retrieval, University of Massachusetts Amherst. James Allan et al. ACM SIGIR Forum, Volume 37 Issue 1, April 2003. (A report about fruitful areas for future work in IR over a five-year time scale.)

2002

Learning with Scope, with Application to Information Extraction and Classification. David Blei, Drew Bagnell and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2002. (Learn highly reliable formatting-based extractors on the fly at test time, using graphical models and variational inference. Describes both generative and conditional versions of the model.)

2001

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John Lafferty, Andrew McCallum and Fernando Pereira. ICML-2001. (A conditionally-trained model for sequences and other structured data, with global normalization. The original CRF paper. Don't bother reading the section on parameter estimation---use BFGS instead of Iterative Scaling; e.g. see [McCallum UAI 2003].)
Toward Optimal Active Learning through Sampling Estimation of Error Reduction. Nick Roy and Andrew McCallum. ICML-2001. (A leave-one-out approach to active learning.)
Unlocking the Information in Text. Dallan Quass, Andrew McCallum, William Cohen. The Future of Software, Winter 2000/2001. (An overview of text mining for the Web.)

2000

Learning to Understand the Web. William Cohen, Andrew McCallum, Dallan Quass. IEEE Data Engineering Bulletin. September 2000, Vol. 23, No. 3. Pages 17-24.
Automating the Construction of Internet Portals with Machine Learning. Andrew McCallum, Kamal Nigam, Jason Rennie, Kristie Seymore. Information Retrieval Journal, volume 3, pages 127-163. Kluwer. 2000.
Maximum Entropy Markov Models for Information Extraction and Segmentation. Andrew McCallum, Dayne Freitag and Fernando Pereira. ICML-2000.
Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. Andrew McCallum, Kamal Nigam and Lyle Ungar. KDD-2000.
Information Extraction with HMM Structures Learned by Stochastic Optimization. Dayne Freitag and Andrew McCallum AAAI-2000.
Creating Customized Authority Lists. Huan Chang, David Cohn and Andrew McCallum. ICML-2000.
Semi-supervised Clustering with User Feedback. David Cohn, Rich Caruana and Andrew McCallum. Unpublished manuscript. (Submitted to AAAI 2000)

1999

Multi-Label Text Classification with a Mixture Model Trained by EM. Andrew McCallum. Revised version of paper appearing in AAAI'99 Workshop on Text Learning.
A Hierarchical Probabilistic Model for Novelty Detection in Text. Doug Baker, Thomas Hofmann, Andrew McCallum and Yiming Yang. Unpublished manuscript. (Submitted to NIPS'99.)
Using Maximum Entropy for Text Classification. Kamal Nigam, John Lafferty, Andrew McCallum. IJCAI'99 Workshop on Information Filtering.
Information Extraction with HMMs and Shrinkage Dayne Frietag and Andrew McCallum. AAAI'99 Workshop on Machine Learning for Information Extraction.
Learning Hidden Markov Model Structure for Information Extraction Kristie Seymore, Andrew McCallum, Roni Rosenfeld. AAAI'99 Workshop on Machine Learning for Information Extraction.
Building Domain-Specific Search Engines with Machine Learning Techniques. Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. AAAI-99 Spring Symposium. A related paper was also accepted to IJCAI'99.
Using Reinforcement Learning to Spider the Web Efficiently. Jason Rennie and Andrew McCallum. ICML'99.
Bootstrapping for Text Learning Tasks. Rosie Jones, Andrew McCallum, Kamal Nigam and Ellen Riloff. IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications.

1998

A Comparison of Event Models for Naive Bayes Text Classification. Andrew McCallum and Kamal Nigam. AAAI-98 Workshop on "Learning for Text Categorization".
Improving Text Classification by Shrinkage in a Hierarchy of Classes. Andrew McCallum, Ronald Rosenfeld, Tom Mitchell and Andrew Ng. ICML-98.
Employing EM in Pool-Based Active Learning for Text Classification. Andrew McCallum and Kamal Nigam. ICML-98.
Distributional Clustering of Words for Text Classification. Doug Baker, Andrew McCallum. SIGIR-98.
Text Classification from Labeled and Unlabeled Documents using EM. Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning, 39(2/3). pp. 103-134. 2000.
Learning to Classify Text from Labeled and Unlabeled Documents. Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. AAAI-98.
Learning to Extract Knowledge from the World Wide Web. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, Sean Slattery. AAAI-98.

1997

McCallum, R. Andrew, Efficient Exploration in Reinforcement Learning with Hidden State, AAAI Fall Symposium on "Model-directed Autonomous Systems", 1997.

1996

McCallum, R. Andrew, Hidden State and Reinforcement Learning with Instance-Based State Identification, IEEE Transations on Systems, Man and Cybernetics (Special issue on Robot Learning), 26(3):464--473, 1996.
McCallum, R. Andrew, Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks, in From Animals to Animats, Fourth International Conference on Simulation of Adaptive Behavior, (SAB'96). Cape Cod, Massachusetts. September, 1996.

1995

McCallum, Andrew K., Reinforcement Learning with Selective Perception and Hidden State, PhD. thesis. December, 1995.
McCallum, R. Andrew, Instance-Based Utile Distinctions for Reinforcement Learning, The Proceedings of the Twelfth International Machine Learning Conference (ML'95), Lake Tahoe, CA, 1995.
McCallum, R. Andrew, Instance-Based State Identification for Reinforcement Learning, Advances in Neural Information Processing Systems (NIPS 7), 1995.

1994

McCallum, R. Andrew, First Results with Instance-Based State Identification for Reinforcement Learning, URCS Tech Report 502, 1994.
McCallum, R. Andrew, Reduced Training Time for Reinforcement Learning with Hidden State, The Proceedings of the Eleventh International Machine Learning Workshop (Robot Learning), New Brunswick, NJ, 1994.
McCallum, R. Andrew, Short-Term Memory in Visual Routines for `Off-Road Car Chasing', Working Notes of AAAI Spring Symposium Series, "Toward Physical Interaction and Manipulation", Stanford University, March 21-23, 1994.

1993 and earlier

McCallum, R. Andrew, Overcoming Incomplete Perception with Utile Distinction Memory, The Proceedings of the Tenth International Machine Learning Conference (ML'93), Amherst, MA, 1993.
McCallum, R. Andrew, Learning with Incomplete Selective Perception, Thesis Proposal, URCS Tech Report 453, 1993.
Garrett, Scott, Bianchini, Kontothanassis, McCallum, Thomas, Wisniewski and Luk, Linking Shared Segments, Winter USENIX, San Diego, CA, 1993.
McCallum, R. Andrew, First Results with Utile Distinction Memory for Reinforcement Learning, URCS Tech Report 446, 1992.
McCallum, R. Andrew, Using Transitional Proximity for Faster Reinforcement Learning, The Proceedings of the Ninth International Machine Learning Conference (ML'92), Aberdeen, Scotland, 1992.
Garrett, Bianchini, Kontothanassis, McCallum, Thomas, Wisniewski and Scott, Dynamic Sharing and Backward Compatibility on 64-Bit Machines, URCS Tech Report 418, 1992.
McCallum, R. Andrew, and Spackman, Kent A., Using Genetic Algorithms to Learn Disjunctive Rules from Examples, The Proceedings of the Seventh International Machine Learning Conference (ML'90), Austin, Texas, 1990.

	Professor Andrew McCallum Computer Science Department University of Massachusetts Amherst mccallum@cs.umass.edu +1 413 545-1323 (vox) +1 413 545-1789 (fax)
Contact Bio Vita Publications Talks Projects Lab People Code Data Teaching Personal	Publications 2020 Simultaneously Linking Entities and Extracting Relations from Biomedical Text Without Mention-level Supervision. Trapit Bansal, Pat Verga, Neha Choudhary, Andrew McCallum, Conference of Association for the Advancement of Artificial Intelligence (AAAI) 2020. 2019 Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks. Pedram Rooshenas, Dongxu Zhang, Gopal Sharma, Andrew McCallum. Proceedings of Neural Information Processing Systems (NeurIPS) 2019. Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering. Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, Andrew McCallum. EMNLP-IJCNLP Workshop on Machine Reading for Question Answering (MRQA, Best Paper Award), 2019. Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering. Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, Andrew McCallum. ArXiv preprint arXiv:1909.07598, 2019. Scalable Hierarchical Clustering with Tree Grafting. Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael R Glass, Andrew McCallum. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019. Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019. Paper Matching with Local Fairness Constraints. Ari Kobren, Barna Saha, Andrew McCallum. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2019. Smoothing the Geometry of Probabilistic Box Embeddings.. Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko and Andrew McCallum. International Conference on Learning Representations (ICLR) Oral presentation, 2019. Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering. Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum, International Conference on Learning Representations (ICLR) 2019. Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension. Rajarshi Das, Tsendsuren Munkhdalai, Eric Xingdi Yuan, Adam Trischler, Andrew McCallum. International Conference on Learning Representations (ICLR) 2019. Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders . Andrew Drozdov, Patrick Verga , Mohit Yadav, Mohit Iyyer, and Andrew McCallum. Association of Computational Linguistics (ACL), 2019. Optimal Transport-based Alignment of Learned Character Representations for String Similarity. Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum. Association of Computational Linguistics (ACL), 2019. A2N: Attending to Neighbors for Knowledge Graph Inference. Trapit Bansal, Da-Cheng Juan, Sujith Ravi, Andrew McCallum. Association of Computational Linguistics (ACL), 2019. Energy and Policy Considerations for Deep Learning in NLP. Emma Strubell, Ananya Ganesh and Andrew McCallum. Association of Computational Linguistics (ACL), 2019. Supervised Hierarchical Clustering with Exponential Linkage. Nishant Yadav, Ari Kobren, Nichonas Monath, Andrew McCallum. International Conference on Machine Learning (ICML), 2019. Integrating User Feedback under Identity Uncertainty in Knowledge Base Construction. Ari Kobren, Nicholas Monath, Andrew McCallum. Automated Knowledge Base Construction (AKBC), 2019. The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures. Sheshera Mysore, Zach Jensen, Edward Kim, Kevin Huang, Haw-Shiuan Chang, Emma Strubell, Jeffrey Flanigan, Andrew McCallum, Elsa Olivetti. LAW XIII 2019: The 13th Linguistic Annotation Workshop (ACL WS), 2019. Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks. Edward Kim, Zach Jensen, Alexander van Grootel, Kevin Huang, Matthew Staib, Sheshera Mysore, Haw-Shiuan Chang, Emma Strubell, Andrew McCallum, Stefanie Jegelka, and Elsa Olivetti. arXiv pre-print 1901.00032, in submission, 2019. 2018 Compact Representation of Uncertainty in Clustering. Craig Greenberg, Nicholas Monath, Ari Kobren, Patrick Flaherty, Andrew McGregor, Andrew McCallum. Neural Information Processing Systems (NIPS), 2018. Embedded-State Latent Conditional Random Fields for Sequence Labeling. Dung Thai, Sree Harsha Ramesh, Shikhar Murty, Luke Vilnis and Andrew McCallum. Conference on Computational Natural Language Learning (CoNLL), 2018. Linguistically-Informed Self-Attention for Semantic Role Labeling. Emma Strubell, Patrick Verga, Daniel Andor, David Weiss and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP, Best long paper award). Brussels, Belgium. October 2018. Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets. Nathan Greenberg, Trapit Banasl, Patrick Verga , and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP short). Brussels, Belgium. October 2018. Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings. Haw-Shiuan Chang, Amol Agrawal, AAnanya Ganesh, AAnirudha Desai, Vinayak Mathur, Alfred Hough, and Andrew McCallum. TextGraphs-12: the Workshop on Graph-based Methods for Natural Language Processing, (NAACL HLT WS), 2018. A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset. Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael Witbrock. Association for Computational Linguistics Workshop on Machine Reading for Question Answering (ACL WS, Best paper award) 2018. Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL? Emma Strubell and Andrew McCallum. Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP (ACL WS). Melbourne, Australia. July 2018. Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures. Luke Vilnis, Xiang Lorraine Li, Shikhar Murty, Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL) 2018. Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking. Shikhar Murty, Patrick Verga, Luke Vilnis, Irena Radovanovic and Andrew McCallum. The 56th Annual Meeting of the Association for Computational Linguistics oral presentation* (ACL) 2018. Learning Conditionally Calibrated Equations of State for Direct Fired sCO2 Cycles with Deep Neural Networks. Luke Vilnis, David Freed, Navid Rafati, Joe Camilo, Andrew McCallum. The 6th International Supercritical CO2 Power Cycles Symposium (sCO2), 2018 Training Structured Prediction Energy Networks with Indirect Supervision Amirmohammad Rooshenas, Aishwarya Kamath, Andrew McCallum. In Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT NAACL) 2018. Go for a Walk and Arrive at the Answer: Reasoning Over Knowledge Bases with Reinforcement Learning. Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola and Andrew McCallum. International Conference on Learning Representations (ICLR) 2018. Simultaneously Self-attending to All Mentions for Full-Abstract Biological Relation Extraction. Patrick Verga, Emma Strubell and Andrew McCallum. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) 2018. Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection. Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL HLT) 2018. 2017 Go for a Walk and Arrive at the Answer: Reasoning Over Knowledge Bases with Reinforcement Learning. (Workshop Version, see also ICLR 2018 conference paper.) Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola and Andrew McCallum. Neural Information Processing Systems Workshop on Automated Knowledge Base Construction (AKBC NIPS WS, Best paper award) 2017. Finer Grained Entity Typing with TypeNet. Shikhar Murty, Patrick Verga , Luke Vilnis, and Andrew McCallum. 6th Workshop on Automated Knowledge Base Construction (AKBC NIPS WS) 2017. Automatically Extracting Action Graphs From Materials Science Synthesis Procedures. Sheshera Mysore, Edward Kim, Emma Strubell, Ao Liu, Haw-Shiuan Chang, Srikrishna Kompella, Kevin Huang, Andrew McCallum and Elsa Olivetti. NIPS Workshop on Machine Learning for Molecules and Materials. Spotlight talk. (NIPS WS) 2017. Attending to All Mention Pairs for Full Abstract Biological Relation Extraction. Patrick Verga, Emma Strubell, Ofer Shai, and Andrew McCallum. 6th Workshop on Automated Knowledge Base Construction (AKBC NIPS WS) 2017. Materials synthesis insights from scientific literature via text extraction and machine learning. Edward Kim, Kevin Huang, Adam Saunders, Andrew McCallum, Gerbrand Ceder, Elsa Olivetti. Chemistry of Materials 29 (21), 9436-9444. 2017. Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples. Haw-Shiuan Chang, Erik Learned-Miller, Andrew McCallum. Neural Information Processing Conference (NIPS) 2017. Improved Representation Learning for Predicting Commonsense Ontologies. Xiang Lorraine Li, Luke Vilnis, Andrew McCallum. International Conference on Machine Learning Workshop on Deep Structured Prediction (ICML WS) 2017. Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling. Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, Andrew McCallum. International Conference on Machine Learning Workshop on Deep Structured Prediction (ICML WS) 2017. Unsupervised Hypernym Detection by Distributional Inclusion Vector Embedding . Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum. ArXiv preprint (ArXiv) 2017. RelNet: End-to-end Modeling of Entities & Relations. Trapit Bansal, Arvind Neelakantan, Andrew McCallum. NIPS Workshop on Automated Knowledge Base Construction (NIPS AKBC WS) 2017. Dependency Parsing with Dilated Iterated Graph CNNs. Emma Strubell, Andrew McCallum. 2nd Workshop on Structured Prediction for Natural Language Processing (EMNLP WS) 2017. An Online Hierarchical Algorithm for Extreme Clustering. Ari Kobren, Nicholas Monath, Akshay Krishnamurthy, Andrew McCallum. Proceedings of Knowledge Discovery and Data Mining, oral presentation (KDD oral) 2017. Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks. Rajarshi Das, Manzil Zaheer, Siva Reddy, Andrew McCallum. Association of Computational Linguistics, short paper (ACL short) 2017. Fast and Accurate Sequence Labeling with Iterated Dilated Convolutions. Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum. Conference on Empirical Methods in Natural Language Processing (EMNLP) 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, Andrew McCallum. (SemEval) 2017. End-to-End Learning for Structured Prediction Energy Networks. David Belanger, Bishan Yang, Andrew McCallum. International Conference on Machine Learning (ICML) 2017. Learning a Natural Language Interface with Neural Programmer. Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei. Submitted to the International Conference on Learning Representations (ICLR), 2017. Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks. Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum. European Association of Computational Linguistics (EACL), 2017. Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema. Patrick Verga, Arvind Neelakantan, Andrew McCallum. European Association of Computational Linguistics (EACL), 2017. 2016 Structured Prediction Energy Networks. David Belanger and Andrew McCallum. International Conference on Machine Learning (ICML), 2016. Multilingual Relation Extraction using Compositional Universal Schema. Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew McCallum. North American Association of Computational Linguistics (NAACL), 2016. Ask the GRU: Multi-task Learning for Deep Text Recommendations. Trapit Bansal, David Belanger, Andrew McCallum. Recommender Systems (RecSys), 2016. Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks. Teresa Martin and Fiete Botschen and Ajay Nagesh and Andrew McCallum. NAACL 2016 Workshop on Automated Knowledge Base Construction (AKBC), 2016. Incorporating Selectional Preferences in Multi-hop Relation Extraction. Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum. NAACL 2016 Workshop on Automated Knowledge Base Construction (AKBC), 2016. Row-less Universal Schema. Patrick Verga and Andrew McCallum. NAACL Workshop on Automated Knowledge Base Construction (AKBC), 2016. Extracting Multilingual Relations under Limited Resources: TAC 2016 Cold-Start KB construction and Slot-Filling using Compositional Universal Schema. Haw-Shiuan Chang, Abdurrahman Munir, Ao Liu, Johnny Tian-ZhengWei, Aaron Traylor, Ajay Nagesh, Nicholas Monath, Patrick Verga, Emma Strubell and Andrew McCallum. Text Analysis Conferenc, Knowledge Base Population (TAC/KBP), 2016. 2015 Structured Prediction Energy Networks. David Belanger, Andrew McCallum. ArXiv pre-print, submitted to ICLR and rejected, 2015. Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches Evgeniy Gabrilovich, Ramanathan Guha, Andrew McCallum, Kevin Murphy. AAAI Spring Symposium Series Technical Report, 2015. Multilingual Relation Extraction using Compositional Universal Schema. Pat Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew McCallum. ArXiv pre-print, submitted to ICLR, 2016. Word Representations via Gaussian Embedding. Luke Vilnis, Andrew McCallum. International Conference on Learning Representations (ICLR) oral presentation, 2015. Compositional Vector Space Models for Knowledge Base Inference. Arvind Neelakantan, Benjamin Roth, Andrew McCallum. AAAI Spring Symposium Series (AAAI-SS), 2015. Bethe Projections for Non-Local Inference. Luke Vilnis, David Belanger, Dan Sheldon, Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI) 2015. Learning Dynamic Feature Selection for Fast Sequential Prediction. Emma Strubell, Luke Vilnis, Kate Silverstein and Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015. Outstanding paper award. Compositional Vector Space Models for Knowledge Base Completion. Arvind Neelakantan, Benjamin Roth and Andrew McCallum. Annual Meeting of the Association for Computational Linguistics (ACL). Beijing, China. July 2015. 2014 Training for Fast Sequential Prediction Using Dynamic Feature Selection. Emma Strubell, Luke Vilnis, and Andrew McCallum. NIPS Workshop on Modern Machine Learning and NLP (NIPS WS). Montreal, Quebec, Canada. December 2014. Knowledge Base Completion using Compositional Vector Space Models. Arvind Neelakantan, Benjamin Roth and Andrew McCallum. In 4th Workshop on Automated Knowledge Base Construction (AKBC) 2014 at NIPS. Outstanding Paper Award. Minimally Supervised Event Argument Extraction using Universal Schema. Benjamin Roth, Emma Strubell, Katherine Silverstein and Andrew McCallum. In 4th Workshop on Automated Knowledge Base Construction (AKBC) at NIPS, Montreal, Quebec, Canada. December 2014. Universal Schema for Slot-Filling, Cold-Start KBP and Event Argument Extraction: UMass IESL at TAC KBP 2014. Benjamin Roth, Emma Strubell, John Sullivan, Lakshmi Vikraman, Katherine Silverstein, and Andrew McCallum. Text Analysis Conference (Knowledge Base Population Track) '14 Workshop (TAC KBP). Gaithersburg, Maryland, USA. November 2014. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. Arvind Neelakantan, Jeevan Shankar, Alexandre Passos and Andrew McCallum. Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP), 2014. A Hierarchical Model for Universal Schema Relation Extraction. Arvind Neelakantan, Alexandre Passos, Andrew McCallum. Workshop on Automatic Creation and Curation of Knowledge Bases (WACCK) at SIGMOD, 2014. Message Passing for Soft Constraint Dual Decomposition. David Belanger, Alexandre Passos, Sebastian Riedel, Andrew McCallum. Uncertainty in Artificial Intelligence (UAI), 2014. Lexicon Infused Phrase Embeddings for Named Entity Resolution. Alexandre Passos, Vineet Kumar, Andrew McCallum. Conference on Computational Natural Language Learning (CoNLL), 2014. Learning Soft Linear Constraints with Application to Citation Field Extraction. Sam Anzaroot, Alexandre Passos, David Belanger, Andrew McCallum. Proceedings of the Association for Computational Linguistics (ACL), 2014. 2013 Optimization and Learning in FACTORIE. Alexandre Passos, Luke Vilnis, Andrew McCallum. Neural Information Processing Systems Workshop on Optimization for Machine Learning (NIPS WS), 2013. Marginal Inference in MRFs using Frank-Wolfe. David Belanger, Dan Sheldon, Andrew McCallum. Neural Information Processing Systems Workshop on Greedy Optimization, Frank-Wolfe and Friends (NIPS WS), 2013. Anytime Belief Propagation Using Sparse Domains. Sameer Singh, Sebastian Riedel, Andrew McCallum. Neural Information Processing Systems Workshop on Resource-Efficient Machine Learning (NIPS WS), 2013. Universal Schema for Slot Filling and Cold Start: UMass IESL at TACKBP. Sameer Singh, David Belanger, Ari Kobren, Michael Wick, Alexandre Passos, Harshal Pandya, Jinho Choi, Brian Martin, Andrew McCallum. Text Analysis Conference (TAC), 2013. Universal Schema for Entity Type Prediction. Limin Yao, Sebastian Reidel, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013. A Joint Model for Discovering and Linking Entities. Michael Wick, Sameer Singh, Harshal Pandya, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013. Assessing Confidence of Knowledge Base Content with an Experimental Study in Entity Resolution. Michael Wick, Sameer Singh, Ari Kobren, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013. Joint Inference of Entities, Relations, and Coreference. Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Zheng, Andrew McCallum. Third International Workshop on Automated Knowledge Base Construction (AKBC), 2013. Dynamic Knowledge Base Alignment for Coreference Resolution. Jiaping Zheng, Luke Vilnis, Sameer Singh, Jinho Choi, Andrew McCallum. Seventeenth Conference on Computational Natural Language Learning (CoNLL), 2013. Transition-based Dependency Parsing with Selectional Branching. Jinho D. Choi, Andrew McCallum, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 2013. Open Scholarship and Peer Review: a Time for Experimentation. David Soergel, Adam Saunders, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013. A New Dataset for Fine-Grained Citation Field Extraction. Sam Anzaroot, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013. Large-scale Author Coreference via Hierarchical Entity Representations. Michael L Wick, Ari Kobren, Andrew McCallum. ICML Workshop on Peer Reviewing and Publishing Models (PEER), 2013. Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia. Sameer Singh, Amar Subramanya, Fernando Pereira, Andrew McCallum. Technical Report (TR) UMASS-CS-2012-015, October, 2012. Relation Extraction with Matrix Factorization and Universal Schemas. Sebastian Riedel, Limin Yao, Benjamin M. Marlin and Andrew McCallum, Joint Human Language Technology Conference/Annual Meeting of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2013. Latent Relation Representations for Universal Schemas. Sebastian Riedel, Limin Yao, Andrew McCallum. International Conference on Learning Representations (ICLR), 2013. 2012 MAP Inference in Chains using Column Generation. David Bellanger, Alexandre Passos, Sebastian Riedel, Andrew McCallum. Proceedings of Neural Information Processing (NIPS), 2012. Probabilistic Databases of Universal Schema. Limin Yao, Sebastian Riedel and Andrew McCallum, NAACL Workshop on Automatic Knowledge Base Construction (AKBC), 2012. Human Machine Cooperation with Epistemological DBs: Supporting User Corrections to Automatically Constructed KBs. Michael Wick, Karl Schultz, and Andrew McCallum. NAACL Workshop on Automatic Knowledge Base Construction (AKBC) 2012. (Best paper runner-up) Monte Carlo MCMC: Efficient Inference by Sampling Factors. Sameer Singh, Michael Wick, and Andrew McCallum. NAACL Workshop on Automatic Knowledge Base Construction (AKBC) 2012. Monte Carlo MCMC: Efficient Inference by Approximate Sampling. Sameer Singh, Michael Wick, Andrew McCallum. Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP), 2012. Combining joint models for biomedical event extraction. David McClosky, Sebastian Riedel, Minhai Surdeanu, Andrew McCallum, Christopher Manning. BMC Bioinformatics, 2012. Speeding up MAP with Column Generation and Block Regularization. David Belanger, Alexandre Passos, Sebastian Riedel and Andrew McCallum, ICML Workshop on Inferning: Interactions between Inference and Learning, (ICML WS), 2012. Parse, Price and Cut - Delayed Column and Row Generation for Graph Based Parsers. Sebastian Riedel, David A. Smith and Andrew McCallum, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012. A Discriminative Hierarchical Model for Fast Coreference at Large Scale. Michael Wick, Sameer Singh, Andrew McCallum. Association for Computational Linguistics (ACL), 2012. Unsupervised Relation Discovery with Sense Disambiguation. Limin Yao, Sebastian Riedel and Andrew McCallum. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), 2012. Topic Models for Taxonomies. Anton Bakalov, Andrew McCallum, Hanna Wallach and David Mimno. Proceedings of the Joint Conference on Digital Libraries (JCDL), 2012. Selecting Actions for Resource-bounded Information Extraction using Reinforcement Learning. Pallika Kanani, Andrew McCallum. Web Search and Data Mining (WSDM), 2012. 2011 Correlations and anticorrelations in LDA inference. Alexandre Passos, Hanna Wallach, Andrew McCallum. Neural Information Processing Systems Workshop on Challenges in Learning Hierarchical Models: Transfer Learning and Optimization (NIPS WS), 2011. Inducing Value Sparsity for Parallel Inference in Tree-shaped Models. Sameer Singh, Brian Martin, Andrew McCallum. Neural Information Processing Systems Workshop on Computational Trade-offs in Statistical Learning (NIPS WS), 2011. Towards Asynchronous Distributed MCMC Inference for Large Graphical Models. Sameer Singh, Andrew McCallum. Neural Information Processing Systems Workshop on Algorithms, Systems, and Tools for Learning at Scale (NIPS WS), 2011. Query Aware McMC. Michael Wick and Andrew McCallum. Proceedings of Neural Information Processing Systems (NIPS), 2011. Toward Interactive Training and Evaluation. Greg Druck and Andrew McCallum. Conference on Information and Knowledge Mangement (CIKM), 2011. Model Combination for Event Extraction in BioNLP. Sebastian Riedel, David McClosky, Mihai Surdeanu, Christopher D. Manning and Andrew McCallum. Proceedings of the Natural Language Processing in Biomedicine NAACL 2011 Workshop (BioNLP), 2011. Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation. Sebastian Riedel and Andrew McCallum. Proceedings of the Natural Language Processing in Biomedicine NAACL 2011 Workshop (BioNLP), 2011. Inter-Event Dependencies support Event Extraction from Biomedical Literature. Roman Klinger, Sebastian Riedel and Andrew McCallum. Mining Complex Entities from Network and Biomedical Data (MIND), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2011. Structured Relation Discovery using Generative Models. Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011. Fast and Robust Joint Models for Biomedical Event Extraction. Sebastian Riedel, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011. Optimizing Semantic Coherence in Topic Models. David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2011. SampleRank: Training Factor Graphs with Atomic Gradients. Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew McCallum. Proceedings of the International Conference on Machine Learning (ICML), 2011. Database of NIH grants using machine-learned categories and graphical clustering. Edmund M Talley, David Newman, David Mimno, Bruce W Herr II, Hanna M Wallach, Gully Burns, Miriam Leenders, Andrew McCallum. Nature Methods, 8, 443–444, 27 May 2011. Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models. Sameer Singh, Amarnag Subramanya, Fernando Pereira, Andrew McCallum. Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011 2010 An Introduction to Conditional Random Fields. Charles Sutton, Andrew McCallum. Foundations and Trends in Machine Learning (FnT ML), to appear. Distantly labeling data for large scale cross-document coreference. Sameer Singh, Michael Wick, Andrew McCallum. Technical report on arXiv (TR), 2010. Distributed MAP Inference for Undirected Graphical Models. Sameer Singh, Amarnag Subramanya, Fernando Pereira, Andrew McCallum. Neural Information Processing Systems Workshop on Learning on Cores, Clusters, and Clouds (NIPS WS), 2010. Machine Translation Using Overlapping Alignments and SampleRank. Benjamin Roth, Andrew McCallum, Marc Dymetman and Nicola Cancedda. Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA), 2010. High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models. Gregory Druck, Andrew McCallum. International Conference on Machine Learning (ICML), 2010. Constraint-Driven Rank-Based Learning for Information Extraction Sameer Singh, Limin Yao, Sebastian Riedel, Andrew McCallum. Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), Collective Cross-Document Relation Extraction Without Labelled Data. Limin Yao, Sebastian Riedel, Andrew McCallum. Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2010. Modeling Relations and Their Mentions without Labeled Text. Sebastian Riedel, Limin Yao, Andrew McCallum. Proceedings of the European Conference on Machine Learning (ECML/PKDD), 2010. Resource-bounded Information Extraction: Acquiring Missing Feature Values On Demand. Pallika H. Kanani, Andrew McCallum, Shaohan Hu. Proceedings of the 14th PA Conference on Knowledge Discovery and Data Mining (PAKDD), 2010. (Best paper runner-up.) Scalable Probabilistic Databases with Factor Graphs and MCMC. Michael Wick, Andrew McCallum, Gerome Miklau. Proceedings of the International Conference on Very Large Databases (VLDB), 2010. 2009 FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs. Andrew McCallum, Karl Schultz, Sameer Singh. Neural Information Processing Systems (NIPS), 2009. Rethinking LDA: Why Priors Matter. Hanna Wallach, David Mimno, Andrew McCallum. Neural Information Processing Systems (NIPS), 2009. Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference.. Michael Wick, Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum. Neural Information Processing Systems (NIPS), 2009. SampleRank: Learning Preferences from Atomic Gradients. Michael Wick, Khashayar Rohanimanesh, Aron Culotta, Andrew McCallum. Neural Information Processing Systems Workshop on Advances in Ranking (NIPS WS), 2009. Bi-directional Joint Inference for Entity Resolution and Segmentation using Imperatively-Defined Factor Graphs. Sameer Singh, Karl Schultz, Andrew McCallum. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2009. Efficient Methods for Topic Model Inference on Streaming Document Collections. Limin Yao, David Mimno and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2009, Paris, France. Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment. Kedar Bellare and Andrew McCallum. Proceedings of Empirical Methods in Natural Language Processing (EMNLP) 2009, Singapore (EMNLP), 2009 Polylingual Topic Models. David Mimno, Hanna Wallach, Jason Naradowsky, David Smith and Andrew McCallum. Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Singapore, 2009. Active Learning by Labeling Features. Gregory Druck, Burr Settles, Andrew McCallum. Proceedings of the Empirical Methods in Natural Language Processing (EMNLP). Inference and Learning in Large Factor Graphs with Adaptive Proposal Distributions. Khashayar Rohanimanesh, Michael Wick, Andrew McCallum. University of Massachusetts Technical Report #UM-CS-2009-008 (TR), 2009 Advances in Learning and Inference for Partition-wise Models of Coreference Resolution. Michael Wick and Andrew McCallum. University of Massachusets Technical Report # UM-CS-2009-028 (TR), 2009 Representing Uncertainty in Databases with Scalable Factor Graphs. Michael Wick, Masters Thesis/Synthesis. Readers: Andrew McCallum and Gerome Miklau. April 2009 An Entity Based Model for Coreference Resolution. Michael Wick, Aron Culotta, Khashayar Rohanimanesh, Andrew McCallum. Proceedings of the SIAM International Conference on Data Mining (SDM), Reno, Nevada, 2009 Alternating Projections for Learning with Expectation Constraints. Kedar Bellare, Gregory Druck and Andrew McCallum. Uncertainty in Artificial Intelligence (UAI), 2009 Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria. Gregory Druck, Gideon Mann, Andrew McCallum. Proceedings of the Association for Computational Linguistics (ACL). Towards Theoretical Bounds for Resource-bounded Information Gathering for Correlation Clustering. Pallika Kanani, Andrew McCallum, Ramesh Sitaraman. UMass TechReport UM-CS-2009-027 (TR), 2009. Generalized Expectation Criteria with application to Semi-Supervised Classification and Sequence Modeling. Gideon Mann and Andrew McCallum. Journal of Machine Learning Research (JMLR). To appear. 2008 Reinforcement Learning for MAP Inference in Large Factor Graphs. Khashayar Rohanimanesh, Michael Wick, Sameer Singh, and Andrew McCallum. University of Massachusetts Technical Report #UM-CS-2008-040 (TR), 2008 Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors. David Mimno, Hanna Wallach and Andrew McCallum. NIPS Workshop on Analyzing Graphs, (NIPS WS), 2008, Whistler, BC. FACTORIE: Efficient Probabilistic Programming for Relational Factor Graphs via Imperative Declarations of Structure, Inference and Learning. Andrew McCallum, Khashayar Rohanemanesh, Michael Wick, Karl Schultz, Sameer Singh. NIPS Workshop on Probabilistic Programming, (NIPS WS), 2008. (Discriminatively trained undirected graphical models, or conditional random fields, have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. Although there has been much historic interest in the combination of logic and probability, we argue that in this mixture 'logic' is largely a red herring. The power in relational models is in their repeated structure and tied parameters; and logic is not necessarily the best way to define these structures. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an object-oriented imperative language to express various aspects of model structure, inference and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented our ideas in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional JVM language named Scala.) A Discriminative Approach to Ontology Alignment. Michael Wick, Khashayar Rohanimanesh, Andrew McCallum, and AnHai Doan. In the International Workshop on New Trends in Information Integration (NTII) at the conference for Very Large Databases (VLDB WS), Auckland, New Zealand, 2008. (New state-of-the-art results on ontology alignment using graph-shaped conditional random fields, joint inference, and parameter estimation by Rank-Based Training.) A Unified Approach for Schema Matching, Coreference, and Canonicalization. Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum. In Conference on Knowledge Discovery and Data Mining (KDD). 2008. (Information integration, performing joint inference over schema matching, entity resolution and canonicalization, using conditional random fields, features encoding clauses in first-order logic, and efficient inference by Metropolis-Hastings. Positive experimental results on multiple data sets.) Unsupervised Deduplication using Cross-field Dependencies. Robert Hall, Charles Sutton, Andrew McCallum. In Conference on Knowledge Discovery and Data Mining (KDD). 2008. (Hierarchical Dirichlet process model that jointly clusters citation venue strings based on both string-edit distance and title information.) Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors. Hanna Wallach, Charles Sutton, Andrew McCallum. In International Conference on Machine Learning, Workshop on Prior Knowledge for Text and Language Processing. (ICML WS), 2008. (Two Bayesian dependency parsing models: 1. Model with Pitman-Yor prior that significantly improves Eisner's classic model; 2. Latent-variable model that learns "syntactic" topics.) Learning from Labeled Features using Generalized Expectation Criteria. Gregory Druck, Gideon Mann and Andrew McCallum. Proceedings of ACM Special Interest Group on Information Retreival, (SIGIR), 2008. (Learn classifiers by labeling features rather than instances. Extensive evaluation on many text data sets, showing substantial improvement over other methods of semi-supervised learning.) Learning to Predict the Quality of Contributions to Wikipedia. Gregory Druck, Gerome Miklau and Andrew McCallum. AAAI Workshop on Wikipedia and AI, (AAAI WS), 2008. (Predict the longevity of an edit to Wikipedia, using textual features of the edit as well as features of the editor. Could be part of a tool to prioritize verification of changes to Wikipedia.) Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression. David Mimno and Andrew McCallum. (Plenary presentation.) Conference on Uncertainty in Artificial Intelligence, (UAI), 2008. (Text documents are usually accompanied by metadata, such as the authors, the publication venue, the date, and any references. Work in topic modeling that has taken such information into account, such as Author-Topic, Citation-Topic, and Topic-over-Time models, has generally focused on constructing specific models that are suited only for one particular type of metadata. This paper presents a simple, unified model for learning topics from documents given arbitrary non-textual features, which can be discrete, categorical, or continuous.) Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields. Gideon Mann and Andrew McCallum. Proceedings of Association of Computational Linguistics, (ACL), 2008. (Generalized expectation for semi-supervised learning of linear-chain conditional random fields.) Piecewise Training for Structured Prediction. Charles Sutton and Andrew McCallum. Accepted to the Machine Learning Journal, (MLJ), 2008. (Efficiently train CRFs in parts. It works well even though full joint inference is used at test time.) Pachinko Allocation: Scalable Mixture Models of Topic Correlations. Wei Li and Andrew McCallum. Submitted to the Journal of Machine Learning Research, (JMLR), 2008. (The pachinko allocation model represents nested correlations among topics using a DAG. This paper has work is in efficiently fitting these models, (as well as plain old LDA) by creating and leveraging sparsity in the distribution over topics to be sampled for each document.) 2007 Unsupervised Coreference of Publication Venues . Robert Hall, Charles Sutton and Andrew McCallum. University of Massachusetts Amherst Technical Report, (TR), 2007. (A generative non-parametric mixture model for entity resolution of publication venues that leverages both the venue titles as well as distributions over words in paper titles.) Generalized Expectation Criteria. Andrew McCallum, Gideon Mann and Gregory Druck. University of Massachusetts Amherst Technical Report #2007-60, (TR), 2007. (This note introduces and motivates Generalized Expectation (GE) criteria. GE criteria are terms in a parameter-estimation objective function that express preferences about model expectations. In certain simple cases, GE falls into the same equivalence class as moment matching, maximum likelihood and maximum entropy estimation. However, our work focusses on leveraging GE's special flexibility in three non-traditional ways: (1) GE criteria can be specified indepently of the model parameterization. In factor graphs, we break the traditional one-to-one mapping between (a) subsets of variables participating in parametered model factors and (b) subsets of variables over which the objective function's expectations are calculated. (2) Within the same objective function, multiple GE terms that are conditional expectations can be conditioned on multiple different data sets. This is useful for semi-supervised learning and transfer learning. (3) A target expectation (or more generally the expectation preference function can come from any source, including other tasks or human domain knowledge. GE is the successor to Expectation Regularization, which is described in our ICML 2007 paper below.) Reducing Annotation Effort using Generalized Expectation Criteria--DRAFT. Gregory Druck, Gideon Mann and Andrew McCallum. University of Massachusetts Amherst Technical Report #2007-62, (TR), 2007. (A version of Generalized Expectation (GE) in which the supervision is provided by labeling features instead of instances. Dramatically faster wall-clock labeling to acheive high accuracy. Experiments on document classification.) Community-based Link Prediction with Text. David Mimno, Hanna M. Wallach and Andrew McCallum. In Proceedings of the NIPS 2007 Workshop on Statistical Network Modeling (NIPS WS), 2007. (New state-of-the-art results in link-prediction using a latent-variable topic model, in which "community" variables are associated with topic distributions and author distributions. Thus the model combines the use of language/topics and co-authorships to discover communities.) Leveraging Existing Resources using Generalized Expectation Criteria. Gregory Druck, Gideon Mann and Andrew McCallum. NIPS Workshop on Learning Problem Design, (NIPS WS), 2007. (Generalized Expectation applied in situations in which there is no labeled data. All supervision is obtained form existing auxiliary resources such as lexicons. Experiments on information extraction.) Lightly-Supervised Attribute Extraction for Web Search. Kedar Bellare, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum and Mark Dredze. NIPS Workshop on Machine Learning for Web Search, (NIPS WS), 2007. (Extract a large number of attributes of different entities from natural language text. Methods based on co-training and maximum entropy classifiers.) People-LDA: Anchoring Topics to People Using Face Recognition. Vidit Jain, Erik Learned-Miller, and Andrew McCallum. International Conference on Computer Vision (ICCV), 2007. (Jointly model people's identity, face appearance in an image, and surrounding text in the image captions with an LDA-style topic model. Improved results in identifying coherent sets of person "mentions"---that is, improved co-reference by using both text and image features.) Joint Group and Topic Discovery from Relations and Text. Andrew McCallum, Xuerui Wang and Natasha Mohanty, Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science 4503, pp. 28-44, (Book chapter), 2007. (Book chapter version of NIPS 2006 conference paper. Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension in forms the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.) Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval. Xuerui Wang, Andrew McCallum and Xing Wei, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM), 2007. (A topic model in the LDA style that uses a Markov model to automatically discover topically-relevant arbitrary-length phrases, not just lists of single words. The phrase discovery is not simply a post-processing step, but an intrinsic part of the model that helps it discover better topics. Experiments on document retrieval tasks.) Canonicalization of Database Records using Adaptive Similarity Measures. Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Defines and explores the problem of "canonicalization"---selecting the best field values for a single, standard record formed from a set of consolodated, co-resolved information sources, such as arise from merging databases, or combining multiple sources of information extraction.) Generalized Component Analysis for Text with Heterogeneous Attributes. Xuerui Wang, Chris Pal and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (A topic model based on an undirected graphical model, which makes it easier to incorporate multiple modalities.) Semi-Supervised Classification with Hybrid Generative/Discriminative Methods. Greg Druck, Chris Pal, Xiaojin Zhu and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (Leverage unlabeled data for text classification by using an objective function that combines (1) joint probability of labels and words and (2) conditional probability of labels give words.) Expertise Modeling for Matching Papers with Reviewers. David Mimno and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD), 2007. (The Author-Persona-Topic model is a LDA-style topic model especially designed to represent expertise as a mixture of topical intersections. We show positive results in matching reviewers to conference papers, as assessed by human judgements.) Learning Extractors from Unlabeled Text using Relevant Databases. Kedar Bellare and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Use conditional random fields to learn information extractors both from DB fields and from alignments of DB in free text. Uses an Alignment CRF, similar to our UAI 2005 paper.) Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes. Pallika Kanani and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Improve entity resolution by adding web pages as new "mentions" to the graph-partitioning problem, and do so efficiently by selecting a subset of the possible queries and a subset of the returned pages.) Probabilistic Representations for Integrating Unreliable Data Sources. David Mimno and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Probabilistic representation of field values used in merging and augmenting information from DBPL and research paper PDFs.) Author Disambiguation using Error-Driven Machine Learning With a Ranking Loss Function. Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew McCallum. Sixth International Workshop on Information Integration on the Web (IIWeb), collocated with AAAI, 2007. (Entity resolution of people using high-order features, made efficient with Metropolis-Hastings and SampleRank, a learning method based ranking.) Nonparametric Bayes Pachinko Allocation. Wei Li, David Blei and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2007. (A version of pachinko allocation that automatically determines the number of topics (and super-topics), and its sparse connectivity structure by Dirichlet process priors. Positive results in redisovering known structure in synthetic data, and in held-out likelihood versus PAM, hLDA and HDP.) Improved Dynamic Schedules for Belief Propagation. Charles Sutton and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2007. (Significantly faster inference in graphical models by selecting which BP messages to send based on an approximation to their residual.) Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization. Gideon Mann and Andrew McCallum. International Conference on Machine Learning (ICML), 2007. (Semi-supervised learning is seldom used in real applications because it is often complicated to implement, fragile in tuning or inefficient for large data. We introduce a new highly usable approach to semi-supervised learning, augmenting traditional label log-likelihood with an additional term that encourages model predictions on unlabeled data to match certain expectations. Positive results on 5 data sets versus EM, transductive SVM, entropy regularization and a graph-based method.) Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. Charles Sutton and Andrew McCallum. ICML, 2007. (Train a large CRF in five times faster by dividing it into separate pieces and reducing numbers of predicted variable combinations with pseudolikelihood. Analysis in terms of belief propagation and Bethe energy.) Mixtures of Hierarchical Topics with Pachinko Allocation. David Mimno, Wei Li and Andrew McCallum. ICML, 2007. (From a large document collection automatically discover topic hierarchies, where documents may be flexibly represented as mixtures across multiple leaves, not just mixtures up and down a single leaf-root path. Thus, for example, we can represent a document about instructing a robot in natural language, where those two topics are leaves. This new model, hPAM, combines the best of pachinko allocation (PAM) and hierarchical LDA (hLDA). Dramatic improvements in held-out data likelihood and mutual information between discovered topics and human-assigned categories.) Transfer Learning for Enhancing Information Flow in Organizations and Social Networks. Chris Pal, Xuerui Wang and Andrew McCallum. Submitted to Conference on Email and Spam (CEAS), 2007. Technical Note. (Continuous hidden varable conditional random field for CC prediction/suggestion in email.) Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email. Andrew McCallum, Xuerui Wang and Andres Corrada-Emmanuel. Journal of Artificial Intelligence Research (JAIR), 2007. (Journal paper version of IJCAI conference paper on Author-Recipient-Topic (ART) model.) Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. Gideon Mann and Andrew McCallum. NAACL/HLT, (short paper) 2007. (A new, faster dynamic program for calculating the entropy of a finite-state subsequence and its gradient.) First-Order Probabilistic Models for Coreference Resolution. Aron Culotta, Michael Wick, Robert Hall and Andrew McCallum. NAACL/HLT, 2007. (Traditional coreference uses features only over pairs of mentions. Here we present a conditional random field with first-order logic for expressing features, enabling features over sets of mentions. The result is a new state-of-the-art results on ACE 2004 coref, jumping from 69 to 79---a 45% reduction in error. The advance depends crucially on a new method of parameter estimation for such "weighted logic" models based on learning rankings and error-driven training.) Sparse Message Passing Algorithms for Weighted Maximum Satisfiability. Aron Culotta, Andrew McCallum, Bart Selman, Ashish Sabharwal. New England Student Symposium on Artificial Intelligence (NESCAI), 2007. (A new algorithm for solving weighted maximum satisfiability (WMAX-SAT) problems that divides a large problem into sub-problems, and coordinates the global solution by message passing with sparse messages. Inspired by the desire to do joint-inference in (a) large weighted logics ala Markov Logic Networks, (b) large NLP pipelines, in which there are efficient pre-existing (dynamic programming) solutions to sub-parts of the pipeline. Positive results versus WalkSAT!) Cryptogram Decoding for OCR using Numerzation Strings. Gary Huang, Erik Learned-Miller and Andrew McCallum. ICDAR, 2007. (Robust OCR without font appearance models by incorporating language modeling.) Penn/UMass/CHOP BiocreativeII Systems. Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew McCallum, Steven Carroll, Yang Jin, and Peter White. BiocreativeII Evaluation Workshop. 2007. (Description of our high-ranking entry in the competition for extraction and linkage from bioinformatics text. Resource-bounded Information Gathering for Correlation Clustering. Pallika Kanai and Andrew McCallum. Conference on Computational Learning Theory (COLT) Open Problems Track, 2007. (We present a new class of problems in which the goal is to perform correlational clustering under circumstances in which accuracy can be improved by augmenting the given graph with additional information.) Organizing the OCA: Learning faceted subjects from a library of digital books. David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL), 2007. (Introduces the DCM-LDA topic model, which represents topics by a Dirichlet-compound-multinomial rather than a multinomial. In addition to obtaining interesting information about the difference varianes of the topics, this model lends itself to efficient parallelization with very coarse-grained synchronization. The result is a topic model that can run on over 1 billion words in just a few hours.) Mining a digital library for influential authors. David Mimno and Andrew McCallum. Joint Conference on Digial Libraries (JCDL), 2007. (A probabilistic model that ranks authors based on their influence on particular areas of scientific research. Integrates topics with citation patterns.) Improving Author Coreference by Resource-bounded Information Gathering from the Web. Pallika Kanani, Andrew McCallum and Chris Pal. International Joint Conference on Artificial Intelligence (IJCAI), 2007. (Sometimes there is simply insufficient information to make an accurate entity resolution decision, and we must gather additional evidence. This paper describes the use of web queries to improve research paper author coreference, exploring two methods of augmenting a graph partitioning problem: using the web to obtain new features on existing edges, and use the web to obtain new nodes in the graph. We then go on to describe decision-theoretic approaches for maximizing accuracy gain with a limited budget of web queries, and demonstrate our methods on three large data sets.) Dynamic Conditional Random Fields. Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh. Journal of Machine Learning Research (JMLR), Vol. 8(Mar), pages 693-723, 2007. (Journal paper version of ICML paper by the same authors, with new experiments on marginal likelihood training.) 2006 On Discriminative and Semi-Supervised Dimensionality Reduction. Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck and Andrew McCallum. Advances in Neural Information Processing Systems, Workshop on Novel Applications of Dimensionality Reduction, (NIPS Workshop), 2006. (Using Multi-Conditional Learning, learn to distribute mixture components just were needed to address some discriminative task. See compelling figure on synthetic overlapping spiral data.) Learning Field Compatibilities to Extract Database Records from Unstructured Text. Michael Wick, Aron Culotta and Andrew McCallum. Empirical Methods in Natural Language Processing (EMNLP), 2006. (Record extraction, jointly accounting for multi-field compatibility by content and layout features.) Tractable Learning and Inference with Higher-Order Representations. Aron Culotta and Andrew McCallum. ICML Workshop on Open Problems in Statistical Relational Learning, 2006. (When working with CRFs having features based on first-order logic, the "unrolled" graphical model would be far to large to fully instantiate. This paper describes a method leveraging MCMC to perform inference and learning while only partially instantiating the model. Positive results on entity resolution (of research papr authors) are described.) Corrective Feedback and Persistent Learning for Information Extraction. Aron Culota, Trausti Kristjansson, Andrew McCallum, Paul Viola. Artificial Intelligence Journal (AIJ), volume 170, pages 1101-1122, 2006. (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction. Journal paper version of AAAI paper by the same authors below. Adds experiments with active learning.) CC Prediction with Graphical Models. Chris Pal and Andrew McCallum. Conference on Email and Anti-Spam (CEAS), 2006. (Help keep an organization coordinated by suggesting who to carbon-copy on your outgoing email message.) Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta, Andrew McCallum. HLT Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, 2006. (Markov Logic Networks are Conditional Random Fields that use first-order logic to define features and parameter tying patterns. Making such models scale to non-trivial data set sizes is a challenge because the size of the full instantiation of the model is exponential in the arity of the formulae. Here we describe a method of partial instantiation that allows such models to scale to entity resolution problems millions of entity mentions. On both citation and author entity resolution problems we show that inclusing such first-order features provides increases in accuracy.) A Continuous-Time Model of Topic Co-occurrence Trends. Xuerui Wang, Wei Li, and Andrew McCallum. AAAI Workshop on Event Detection, 2006. (Capture the time distributions not only of a topics, but also of their co-occurrences. For example, notice that while NLP and ML have both been around for a long time, but their co-occurrence has been rising recently. The model is effectively a combination of the Pachinko Allocation Model (PAM) and Topics-Over-Time (TOT).) Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning. Michael Kelm, Chris Pal, and Andrew McCallum. Draft accepted to the International Conference on Pattern Recognition (ICPR), 2006. (Multi-conditional learning explored in the context of computer vision.) Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. Andrew McCallum, Chris Pal, Greg Druck, Xuerui Wang. AAAI, 2006. (Estimate parameters of an undirected graphical model not by joint likelihood, or conditional likelihood, but by a product of multiple conditional likelihoods. Can act as an improved regularizer. With latent variables, can cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models and (cross-cutting) conditional-training. Improved results on document classification, Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task.) Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. Wei Li, and Andrew McCallum. ICML, 2006. (An LDA-style topic model that captures correlations between topics, enabling discovery of finer-grained topics. Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary, nested and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and held-out data likelihood.) Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. Xuerui Wang and Andrew McCallum. Conference on Knowledge Discovery and Data Mining (KDD) 2006. (A new LDA-style topic model that models trends over time. The meaning of a topic remains fixed and reliable, but its prevalence over time is captured, and topics may thus focus in on co-occurrence patterns that are time-sensitive. Unlike other work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps. Improvements in topic saliency and the ability to predict time given words.) Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition. Shaolei L. Feng, R. Manmatha and Andrew McCallum. IEEE International Conference on Document Image Analysis for Libraries (DIAL 06), pp. 30-37. 2006. (Mixed results on CRFs applied to handwritten word recognition.) Reducing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. HLT-NAACL, 2006. (Train separately CRFs with different subsets of the features, then integrate them at test time---four different variations on the method. Especially make more reliable use of lexicon features and other highly-predictable but brittle features.) Integrating Probabilistic Extraction Models and Relational Data Mining to Discover Relations and Patterns in Text. Aron Culotta, Andrew McCallum and Jonathan Betz. HLT-NAACL, 2006. (Extract relations from Wikipedia articles. Run data mining on the relational graph to obtain patterns that are predictive of relations---such as "opponent of my opponent is my ally" and "a person is likely to have the same religion as their parents." Then use feaures derived from these patterns in a second run of extraction that improves accuracy.) Bibliometric Impact Measures Leveraging Topic Analysis. Gideon Mann, David Mimno and Andrew McCallum. Joint Conference on Digital Libraries (JCDL) 2006. (Use a new topic model that leverages n-grams to discover interpretable, fine-grained topics in over a million research papers. Use these topic divisions as well as automated citation analysis to extend three existing bibliometric impact measures, and create three new ones: Topical Diversity, Topical Transfer, Topical Precedence.) An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton and Andrew McCallum. Book chapter in Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. (An overview and introduction to conditional random fields for beginners and experts alike---motivation, background, mathematical foundations, linear-chain form, general-structure form, inference, parameter estimation, tips and tricks, an example application to information extraction with a skip-chain structure.) Sparse Forward-Backward using Minimum Divergence Beams for Fast Training of Conditional Random Fields. Chris Pal, Charles Sutton, and Andrew McCallum. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006. (An alternative method for beam-search based on variational principles. Enables not only faster test-time performance of large-state-space CRFs, but this method makes beam search robust enough to be used at training time, enabling dramatically faster learning of discriminative finite-state methods for speech, IE and other applications.) Table extraction for answer retrieval. Xing Wei, Bruce Croft and Andrew McCallum. Information Retrieval Journal (IRJ), volume 9, issue 5, pages 589-611, November 2006. (Information extraction from tables, using conditional random fields with language and layout features, with application to question answering. Journal paper version of our SIGIR 2003 paper.) Semi-supervised Text Classification Using EM. Kamal Nigam, Andrew McCallum and Tom Mitchell. Book chapter in Chapelle, O., Zien, A., and Scholkopf, B. (Eds.) Semi-Supervised Learning. MIT Press: Boston. 2006. (Overview, description, experiments on using expectation maximization with naive Bayes text classifiers for learning from labeled and unlabeled data. A chapter in a book about various methods of semi-supervised learning.) Group and Topic Discovery from Relations and Their Attributes. Xuerui Wang, Natasha Mohanty and Andrew McCallum. Neural Informaion Processing Systems (NIPS), 2006. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.) 2005 A Note on Topical N-grams. Xuerui Wang and Andrew McCallum. University of Massachusetts Technical Report UM-CS-2005-071, 2005. (Discover topics like Latent Dirichlet Allocation, but model phrases in addition to single words on a per-topic basis. For example, in the Politics topic, "white house" has special meaning as a colocation, while in the RealEstate topic, modeling the individual words is sufficient. Our TNG model produces much cleaner, more interpretable topics.) Pachinko allocation: A Directed Acyclic Graph for Topic Correlations. Wei Li and Andrew McCallum. NIPS Workshop on Nonparametric Bayesian Methods, 2005. (Similar motivations to Blei and Lafferty's Correlated Topic Model (CTM), but uses a DAG to capture arbitrary and possibly sparse correlations among topics. Interior nodes of the DAG have a Dirichlet distribution over their children; words are in the leaves. Provides improved interpretability and classification, as well as improved held-out likelihood over CTM. See ICML 2006 paper above.) Direct Maximization of Rank-Based Metrics for Information Retrieval. Don Metzler, W. Bruce Croft and Andrew McCallum. CIIR Technical Report IR-429, 2005. Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, volume 3, Number 9, November 2005. (An overview of information extraction by machine learning methods, written for people not familiar with machine learning, especially CTOs and other people in business.) Learning Clusterwise Similarity with First-order Features. Aron Culotta and Andrew McCallum. NIPS Workshop on the Theoretical Foundations of Clustering. 2005. (Discriminatively-trained graph-partitioning methods for clustering, with features over entire clusters, including existential and universal quanifiers. Efficiently instantiate these features only on demand.) Composition of Conditional Random Fields for Transfer Learning. Charles Sutton and Andrew McCallum. Proceedings of Human Language Technologies / Emprical Methods in Natural Language Processing (HLT/EMNLP) 2005. (Improve information extraction from email data by using the output of another extractor that was trained on large quantities of newswire. Improve accuracy further by using joint inference between the two tasks---so that the final target task can actually affect the output of the intermediate task.) Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-402. 2005. (Avoid a common under-appreciated problem: overly heavy reliance on a few discriminative features which may not be as reliably present in the testing data. Discusses four methods of separate training and combination, and presents statistically-significant improvements---including new best results on CoNLL-2000 NP Chunking.) Fast, Piecewise Training for Discriminative Finite-state and Parsing Models. Charles Sutton and Andrew McCallum. Center for Intelligent Information Retrieval Technical Report IR-403. 2005. (Further results with "piecewise training", a method also described in a UAI'05 paper below.) Practical Markov Logic Containing First-order Quantifiers with Application to Identity Uncertainty. Aron Culotta and Andrew McCallum. Technical Report IR-430, University of Massachusetts, September 2005. (Use existental and universal quantifiers in Markov Logic, doing so practially and efficiently by incrementally instantiating these terms as needed. Applied to object correspondence, this model combines the expressivity of BLOG with the predictive accuracy advantages of conditional probability training. Experiments on citation matching and author disambiguation.) Joint Deduplication of Multiple Record Types in Relational Data. Aron Culotta and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005. (Longer Tech Report version: A Conditional Model of Deduplication for Multi-type Relational Data. Technical Report IR-443, University of Massachusetts, September 2005. (Leverage relations among multiple entity types to perform coreference collectively among all types. Uses CRF-style graph partitioning with a learned distance metric. Experimental results on joint coreference of both citations and their venues showing that accuracy on both improves.) Collective Multi-Label Classification. Nadia Ghamrawi and Andrew McCallum. Fourteenth Conference on Information and Knowledge Management (CIKM), 2005. (Multi-label document classification with a conditional maximum entropy model that captures not only the traditional dependences between words and the class labels, but also the coocurrence dependencies between the class labels. Performs joint inference among all class labels.) Predictive Random Fields: Latent Variable Models Fit by Multiway Conditional Probability with Applications to Document Analysis. Andrew McCallum, Xuerui Wang and Chris Pal. UMass Technical Report UM-CS-2005-053, version 2.1. 2005. (Cluster structured, relational data, like Latent Dirichlet Allocation and its successors, but with undirected graphical models that are conditionally-trained. Improved results over Jebara-inspired synthetic data, and over the Harmonium as tested on an information retreival task. This is an evolving Tech Report, which needs to be updated---in particular we are now referring to this method as "Multi-Conditional Learning" or "Multi-Conditional Mixtures".) Group and Topic Discovery from Relations and Text. Xuerui Wang, Natasha Mohanty and Andrew McCallum. KDD Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD) 2005. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.) Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. Yu Gu, Andrew McCallum and Don Towsley. Internet Measurement Conference, 2005. (Build a density model of normal Internet traffic with Maximum Entropy and feature induction. Detect network attacks by density threshold.) A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance. Andrew McCallum, Kedar Bellare and Fernando Pereira. Conference on Uncertainty in AI (UAI), 2005. (Train a string edit distance function from both positive and negative examples of string pairs (matching and mismatching). Significantly, the model designer is free to use arbitrary, fancy features of both strings, and also very flexible edit operations. This model is an example of an increasingly popular interesting class---conditionally-trained models with latent variables. Positive results on citations, addresses and names.) Joint Parsing and Semantic Role Labeling. Charles Sutton and Andrew McCallum. CoNLL (Shared Task), 2005. (Attempt to improve accuracy by performing joint inference over parsing and semantic role labeling---preserving uncertainty and multiple hypotheses in Dan Bikel's parser. Unfortunately the effort yielded negative results, most likely because the components needed to produce better calibrated probabilities.) Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005. (Use finite-state CRFs to locate introns and exons in DNA sequences. Shows the advantages of CRFs' ability to straightforwardly incorporate homology evidence from protein databases.) Semi-Supervised Sequence Modeling with Syntactic Topic Models. Wei Li and Andrew McCallum. AAAI, 2005. (Learn a low-dimensional manifold from large quantities of unlabled text data, then use components of the manifold as additional features when training a linear-chain CRF with limited labeled data. The manifold is learned using HMM-LDA [Griffiths, Steyvers, Blei, Tenenbaum 2004], an unsupervised model with special structure suitable for sequences and topics. Experimens with English part-of-speech tagging and Chinese word segmentation.) Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.) Topic and Role Discovery in Social Networks. Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang. IJCAI, 2005. (Conference paper version of tech report by same authors in 2004 below. Also includes new results with Role-Author-Recipient-Topic model. Discover roles by social network analysis with a Bayesian network that models both links and text messages exchanged on those links. Experiments with Enron email and academic email.) Piecewise Training for Undirected Models. Charles Sutton and Andrew McCallum. UAI, 2005. (Efficiently train a large graphical model in separately normalized pieces, and amazingly often obtain higher accuracy than without this approximation. This paper also shows that this piecewise objective is a lower bound on the exact likelihood, and gives results with three different graphical model structures.) Constrained Kronecker Deltas for Fast Approximate Inference and Estimation. Chris Pal, Charles Sutton, Andrew McCallum. Submitted to UAI, 2005. (Sometimes the graph of the graphical model is not large and complex, but the cardinality of the variables is large. This paper describes a new and generalized method for beam search on graphical models, showing positive experimental results for both inference and training. Experiments on NetTalk.) Multi-Way Distributional Clustering via Pairwise Interactions. Ron Bekkerman, Ran El-Yaniv and Andrew McCallum. ICML 2005. (Distributional clustering in multiple feature dimensions or modalities at once--made efficient by a factored representation as used in graphical models, and by a combination of top-down and bottom-up clustering. Results on email clustering, and new best results on 20 Newsgroups.) Disambiguating Web Appearances of People in a Social Network. Ron Bekkerman and Andrew McCallum. WWW Conference, 2005. (Find homepages and other Web pages mentioning particular people. Do a better job by leveraging a collection of related people.) 2004 Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs. Andrew McCallum and Charles Sutton. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-383. 2004. (Also presented at NIPS 2004 Workshop on Learning with Structured Outputs.) (Large undirected graphical models are expensive to train because they require global inference to calculate the gradient of the parameters. We describe a new method for fast training in locally-normalized pieces. Amazingly the resulting models also give higher accuracy than their globally-trained counterparts.) Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. Ron Bekkerman, Andrew McCallum and Gary Huang. UMass CIIR Technical Report IR-418, 2004. (Extensive experiments on real-world email foldering.) The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email. Andrew McCallum, Andres Corrada-Emmanuel, Xuerui Wang. Technical Report UM-CS-2004-096, 2004. (Also presented the NIPS'04 Workshop on " Structured Data and Representations in Probabilistic Models for Categorization") (Social network analysis that not only models links between people, but the word content of the messages exchanged between them. Discovers salient topics guided by the sender-recipient structure in data, and provides improved ability to measure role-similarity between people. A generative model in the style of Latent Dirichlet Allocation.) Conditional Models of Identity Uncertainty with Application to Noun Coreference. Andrew McCallum and Ben Wellner. Neural Information Processing Systems (NIPS), 2004. (A model of object consolidation, based on graph partitioning with learned edge weights. Conference paper version of 2003 work in KDD Workshop on Data Cleaning.) An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching. Ben Wellner, Andrew McCallum, Fuchun Peng, Michael Hay. Conference on Uncertainty in Artificial Intelligence (UAI), 2004. (A conditionally-trained graphical model for identity uncertainty in relational domains, representing mentions, entities and their attributes. Also a first example of joint inference for extraction and identity uncertainty--coreference decisions actually integrate out uncertainty about information extraction.) Collective Segmentation and Labeling of Distant Entities in Information Extraction. Charles Sutton and Andrew McCallum. ICML workshop on Statistical Relational Learning, 2004. (Makes the boundaries and types of distant segments inter-dependent by augmenting a linear-chain CRF with additional long, arching edges. Approximate inference by Tree-Reparameterization.) An Exploration of Entity Models, Collective Classification and Relation Description. Hema Raghavan, James Allan and Andrew McCallum. KDD Workshop on Link Analysis and Group Detection, August 2004. (Part of a student synthesis project: includes an application of RMNs to classifying people in newswire.) Sign Detection in Natural Images with Conditional Random Fields. Jerod Weinman, Al Hansen and Andrew McCallum. IEEE International Workshop on Machine Learning for Signal Processing, 2004. (Part of a student synthesis project: a grid-shaped CRF with inference by belief-propagation with Tree-Reparameterization.) Extracting Social Networks and Contact Information from Email and the Web. Aron Culotta, Ron Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS) 2004. (Describes an early version of an end-to-end system that automatically populates your email address book with a large social network, including "friends-of-friends," and information about people's expertise.) Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Charles Sutton, Khashayar Rohanimanesh and Andrew McCallum. ICML 2004. (Joint inference over two traditionally-separate layers of NLP processing: POS-tagging and NP-chunking. Introduces the CRF analogue of Factorial HMMs. Compares several approximate inference procedures.) Interactive Information Extraction with Constrained Conditional Random Fields. Trausti Kristjannson, Aron Culotta, Paul Viola and Andrew McCallum. Nineteenth National Conference on Artificial Intelligence (AAAI 2004). San Jose, CA. (Winner of Honorable Mention Award.) (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction.) Accurate Information Extraction from Research Papers using Conditional Random Fields. Fuchun Peng and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004. (Applies CRFs to extraction from research paper headers and reference sections, to obtain current best-in-the-world accuracy. Also compares some simple regularization methods.) Chinese Segmentation and New Word Detection using Conditional Random Fields. Fuchun Peng, Fangfang Feng, and Andrew McCallum. Proceedings of The 20th International Conference on Computational Linguistics (COLING 2004) , August 23-27, 2004, Geneva, Switzerland. (State-of-the art Chinese word segmentation with CRFs, with rich features and many lexicons; also using confidence estimation to add new words to the lexicon.) Confidence Estimation for Information Extraction. Aron Culotta and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004, short paper. (How to provide not only an answer, but a formally-justified confidence in that answer--using contrained forward-backward..) A Note on Semi-supervised Learning using Markov Random Fields. Wei Li and Andrew McCallum. Technical Note, February 3, 2004. (A general framework for semi-supervised learning in Conditional Random Fields, with a focus on learning the distance metric between instances. Experimental results with collective classification of documents.) 2003 Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences. Andrew McCallum, Khashayar Rohanimanesh and Charles Sutton. NIPS2003 Workshop on Syntax, Semantics, Statistics, 2003. (Workshop version of ICML 2004 paper.) Classification with Hybrid Generative/Conditional Models. Rajat Raina, Yirong Shen, Andrew Y. Ng, Andrew McCallum. Proceedings of Neural Information Processing Systems (NIPS), 2003. (Train some parameters generatively, some parameters conditionally.) Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Induction. Wei Li and Andrew McCallum. ACM Transactions on Asian Language Information Processing, 2003. (How we developed a named entity recognition system for Hindi in just a few weeks.) A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models. Andrew McCallum and David Jensen. IJCAI'03 Workshop on Learning Statistical Models from Relational Data, 2003. (Describes big-picture motivation and approach for research that performs information extraction and data mining in an integrated fashion, rather than in two separate serial steps. Lays out a major thrust of my current research over a multi-year span.) Efficiently Inducing Features of Conditional Random Fields. Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2003. (CRFs give you the great power to include the kitchen sink worth of features. How do you decide which ones to include to avoid over-fitting and running out of memory? A formal, information-theoretic approach, with carefully-chosen approximations to make it efficient with millions of candidate features. This technique key to success in Hindi above, as well as work by Pereira's group at UPenn) Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Andrew McCallum and Wei Li. Seventh Conference on Natural Language Learning (CoNLL), 2003. (This is the first publication about named entity extraction with CRFs.) Table Extraction Using Conditional Random Fields. David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft. Proceedings of the ACM SIGIR, 2003. (Application of CRFs to finding tables in government reports. Uses both language and layout features.) Object Consolidation by Graph Partitioning with a Conditionally-trained Distance Metric. Andrew McCallum and Ben Wellner. KDD Workshop on Data Cleaning, Record Linkage and Object Consolidation, 2003. (Later, improved version of workshop paper immediately below.) Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference. Andrew McCallum and Ben Wellner. IJCAI Workshop on Information Integration on the Web, 2003. (A conditionally-trained model of object consolidation, based on graph partitioning with learned edge weights.) Challenges in information retrieval and language modeling: report of a workshop held at the Center for Intelligent Information Retrieval, University of Massachusetts Amherst. James Allan et al. ACM SIGIR Forum, Volume 37 Issue 1, April 2003. (A report about fruitful areas for future work in IR over a five-year time scale.) 2002 Learning with Scope, with Application to Information Extraction and Classification. David Blei, Drew Bagnell and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2002. (Learn highly reliable formatting-based extractors on the fly at test time, using graphical models and variational inference. Describes both generative and conditional versions of the model.) 2001 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John Lafferty, Andrew McCallum and Fernando Pereira. ICML-2001. (A conditionally-trained model for sequences and other structured data, with global normalization. The original CRF paper. Don't bother reading the section on parameter estimation---use BFGS instead of Iterative Scaling; e.g. see [McCallum UAI 2003].) Toward Optimal Active Learning through Sampling Estimation of Error Reduction. Nick Roy and Andrew McCallum. ICML-2001. (A leave-one-out approach to active learning.) Unlocking the Information in Text. Dallan Quass, Andrew McCallum, William Cohen. The Future of Software, Winter 2000/2001. (An overview of text mining for the Web.) 2000 Learning to Understand the Web. William Cohen, Andrew McCallum, Dallan Quass. IEEE Data Engineering Bulletin. September 2000, Vol. 23, No. 3. Pages 17-24. Automating the Construction of Internet Portals with Machine Learning. Andrew McCallum, Kamal Nigam, Jason Rennie, Kristie Seymore. Information Retrieval Journal, volume 3, pages 127-163. Kluwer. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation. Andrew McCallum, Dayne Freitag and Fernando Pereira. ICML-2000. Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. Andrew McCallum, Kamal Nigam and Lyle Ungar. KDD-2000. Information Extraction with HMM Structures Learned by Stochastic Optimization. Dayne Freitag and Andrew McCallum AAAI-2000. Creating Customized Authority Lists. Huan Chang, David Cohn and Andrew McCallum. ICML-2000. Semi-supervised Clustering with User Feedback. David Cohn, Rich Caruana and Andrew McCallum. Unpublished manuscript. (Submitted to AAAI 2000) 1999 Multi-Label Text Classification with a Mixture Model Trained by EM. Andrew McCallum. Revised version of paper appearing in AAAI'99 Workshop on Text Learning. A Hierarchical Probabilistic Model for Novelty Detection in Text. Doug Baker, Thomas Hofmann, Andrew McCallum and Yiming Yang. Unpublished manuscript. (Submitted to NIPS'99.) Using Maximum Entropy for Text Classification. Kamal Nigam, John Lafferty, Andrew McCallum. IJCAI'99 Workshop on Information Filtering. Information Extraction with HMMs and Shrinkage Dayne Frietag and Andrew McCallum. AAAI'99 Workshop on Machine Learning for Information Extraction. Learning Hidden Markov Model Structure for Information Extraction Kristie Seymore, Andrew McCallum, Roni Rosenfeld. AAAI'99 Workshop on Machine Learning for Information Extraction. Building Domain-Specific Search Engines with Machine Learning Techniques. Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. AAAI-99 Spring Symposium. A related paper was also accepted to IJCAI'99. Using Reinforcement Learning to Spider the Web Efficiently. Jason Rennie and Andrew McCallum. ICML'99. Bootstrapping for Text Learning Tasks. Rosie Jones, Andrew McCallum, Kamal Nigam and Ellen Riloff. IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications. 1998 A Comparison of Event Models for Naive Bayes Text Classification. Andrew McCallum and Kamal Nigam. AAAI-98 Workshop on "Learning for Text Categorization". Improving Text Classification by Shrinkage in a Hierarchy of Classes. Andrew McCallum, Ronald Rosenfeld, Tom Mitchell and Andrew Ng. ICML-98. Employing EM in Pool-Based Active Learning for Text Classification. Andrew McCallum and Kamal Nigam. ICML-98. Distributional Clustering of Words for Text Classification. Doug Baker, Andrew McCallum. SIGIR-98. Text Classification from Labeled and Unlabeled Documents using EM. Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning, 39(2/3). pp. 103-134. 2000. Learning to Classify Text from Labeled and Unlabeled Documents. Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. AAAI-98. Learning to Extract Knowledge from the World Wide Web. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, Sean Slattery. AAAI-98. 1997 McCallum, R. Andrew, Efficient Exploration in Reinforcement Learning with Hidden State, AAAI Fall Symposium on "Model-directed Autonomous Systems", 1997. 1996 McCallum, R. Andrew, Hidden State and Reinforcement Learning with Instance-Based State Identification, IEEE Transations on Systems, Man and Cybernetics (Special issue on Robot Learning), 26(3):464--473, 1996. McCallum, R. Andrew, Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks, in From Animals to Animats, Fourth International Conference on Simulation of Adaptive Behavior*, (SAB'96). Cape Cod, Massachusetts. September, 1996. 1995 McCallum, Andrew K., Reinforcement Learning with Selective Perception and Hidden State, PhD. thesis. December, 1995. McCallum, R. Andrew, Instance-Based Utile Distinctions for Reinforcement Learning, The Proceedings of the Twelfth International Machine Learning Conference (ML'95), Lake Tahoe, CA, 1995. McCallum, R. Andrew, Instance-Based State Identification for Reinforcement Learning, Advances in Neural Information Processing Systems (NIPS 7), 1995. 1994 McCallum, R. Andrew, First Results with Instance-Based State Identification for Reinforcement Learning, URCS Tech Report 502, 1994. McCallum, R. Andrew, Reduced Training Time for Reinforcement Learning with Hidden State, The Proceedings of the Eleventh International Machine Learning Workshop (Robot Learning), New Brunswick, NJ, 1994. McCallum, R. Andrew, Short-Term Memory in Visual Routines for `Off-Road Car Chasing', Working Notes of AAAI Spring Symposium Series, "Toward Physical Interaction and Manipulation", Stanford University, March 21-23, 1994. 1993 and earlier McCallum, R. Andrew, Overcoming Incomplete Perception with Utile Distinction Memory, The Proceedings of the Tenth International Machine Learning Conference (ML'93), Amherst, MA, 1993. McCallum, R. Andrew, Learning with Incomplete Selective Perception, Thesis Proposal, URCS Tech Report 453, 1993. Garrett, Scott, Bianchini, Kontothanassis, McCallum, Thomas, Wisniewski and Luk, Linking Shared Segments, Winter USENIX, San Diego, CA, 1993. McCallum, R. Andrew, First Results with Utile Distinction Memory for Reinforcement Learning, URCS Tech Report 446, 1992. McCallum, R. Andrew, Using Transitional Proximity for Faster Reinforcement Learning, The Proceedings of the Ninth International Machine Learning Conference (ML'92), Aberdeen, Scotland, 1992. Garrett, Bianchini, Kontothanassis, McCallum, Thomas, Wisniewski and Scott, Dynamic Sharing and Backward Compatibility on 64-Bit Machines, URCS Tech Report 418, 1992. McCallum, R. Andrew, and Spackman, Kent A., Using Genetic Algorithms to Learn Disjunctive Rules from Examples, The Proceedings of the Seventh International Machine Learning Conference (ML'90), Austin, Texas, 1990.