As you can see on my new homepage, I have accepted a tenure-track position in the College of Computer and Information Science at Northeastern University in Boston, starting in September 2012. With colleagues in CS, Social Sciences, and Humanities, I will be starting a Center for Digital Humanities and Computational Social Science.
I am affiliated with the Center for Intelligent Information Retrieval (CIIR). My research interests include: natural language processing, machine translation, semi-supervised machine learning methods, information retrieval, and digital libraries.
Formerly: Natural Language Processing at Johns Hopkins University; and Head Programmer, Perseus Project, Tufts University
See also my curriculum vitae in PDF.
Xiaoye "Tiger" Wu
Spring 2012: Search Engines (CS 446): Tuesdays and Thursdays, 4-5:15, CS Building 142
Fall 2011: Residential Academic Program First-Year Seminar (CS 191a)
Fall 2009: Introduction to Natural Language Processing (CS 585).
Spring 2009: James Allan, R. Manmatha, and I are leading a seminar on Mining Text and Images in Digital Libraries Using Grid Computing.
August 2006: Charles Schafer and I presented a tutorial, Overview of Statistical Machine Translation [pdf], at the Association for Machine Translation in the Americas.
Fall 2005: Noah Smith and I designed and taught a course on Empirical Research Methods in Computer Science.
Jason Naradowsky, Sebastian Riedel, and David A. Smith. Improving NLP through maginalization of hidden syntactic structure. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2012.
Sebastian Riedel, David A. Smith, and Andrew McCallum. Parse, price and cut-delayed column and row generation for graph based parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2012.
Yanchuan Sim, Noah A. Smith, and David A. Smith. Discovering factions in the computational linguistics community. In ACL Workshop on Rediscovering 50 Years of Discoveries, 2012. [ PDF ]
Michael Bendersky and David A. Smith. A dictionary of wisdom and wit: Learning to extract quotable phrases. In NAACL Workshop on Computational Linguistics for Literature, 2012. [ PDF ]
David A. Smith, R. Manmatha, and James Allan. Mining relational structure from millions of books: Position paper. In Proceedings of the CIKM BooksOnline Workshop, pages 49-54, 2011.
Jae-Hyun Park, W. Bruce Croft, and David A. Smith. A quasi-synchronous dependence model for information retrieval. In Conference on Information and Knowledge Management (CIKM), pages 17-26, 2011. [ PDF ]
Jinyoung Kim, W. Bruce Croft, David A. Smith, and Anton Bakalov. Evaluating an associative browsing model for personal information. In Conference on Information and Knowledge Management (CIKM), pages 647-652, 2011. [ PDF ]
Jeffrey Dalton, James Allan, and David A. Smith. Passage retrieval for incorporating global dependencies in sequence labeling. In Conference on Information and Knowledge Management (CIKM), pages 355-364, 2011. [ PDF ]
Kriste Krstovski and David A. Smith. A minimally supervised approach for detecting and ranking document translation pairs. In Proceedings of the Workshop on Statistical Machine Translation, pages 207-216, 2011. [ PDF ]
Michael Bendersky, W. Bruce Croft, and David A. Smith. Joint annotation of search queries. In Proceedings of the Association for Computational Linguistics, pages 102-111, 2011. [ PDF ]
John S. Y. Lee, Jason Naradowsky, and David A. Smith. A discriminative model for joint morphological disambiguation and dependency parsing. In Proceedings of the Association for Computational Linguistics, pages 885-894, 2011. [ PDF ]
Elif Aktolga, James Allan, and David A. Smith. Passage reranking for question answering using syntactic structures and answer types. In European Conference on Information Retrieval (ECIR), pages 617-628, 2011. [ PDF ]
Jinyoung Kim, Anton Bakalov, David A. Smith, and W. Bruce Croft. Building and evaluating a semantic representation for personal information. In Conference on Information and Knowledge Management (CIKM), pages 1741-1744, 2010.
Xiaobing Xue, W. Bruce Croft, and David A. Smith. Query reformulation using query distributions. In Conference on Information and Knowledge Management (CIKM), pages 1497-1500, 2010.
Michael Bendersky, W. Bruce Croft, and David A. Smith. Structural annotation of search queries using pseudo-relevance feedback. In Conference on Information and Knowledge Management (CIKM), pages 1537-1540, 2010. [ PDF ]
Sebastian Riedel, David A. Smith, and Andrew McCallum. Inference by minimizing size, divergence, or their sum. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pages 227-234, 2010. [ PDF ]
Sebastian Riedel and David A. Smith. Relaxed marginal inference and its application to dependency parsing. In Proceedings of the Conference on Human Language Technology of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 760-768, 2010. [ PDF ]
Jangwon Seo, W. Bruce Croft, and David A. Smith. Online community search using thread structure. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pages 1907-1910, 2009.
David A. Smith and Jason Eisner. Parser adaptation and projection with quasi-synchronous grammar features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 822-831, 2009. [ PDF | PowerPoint slides ]
David Mimno, Hanna Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. Polylingual topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 880-889, 2009. [ PDF ]
Michael Bendersky, W. Bruce Croft, and David A. Smith. Two-stage query segmentation for information retrieval. In Proceedings of the 32nd International ACM SIGIR Conference, pages 810-811, 2009. [ PDF ]
David A. Smith and Jason Eisner. Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 145-156, 2008. [ PDF | PowerPoint slides ]
Keith Hall, Jiří Havelka, and David A. Smith. Log-linear models of non-projective trees, k-best MST parsing and tree-ranking. In Proceedings of the CoNLL Shared Task, pages 962-966, 2007.
David A. Smith and Noah A. Smith. Probabilistic models of nonprojective dependency trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 132-140, 2007. [ PDF | PowerPoint slides ]
David A. Smith and Jason Eisner. Bootstrapping feature-rich dependency parsers with entropic priors. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 667-677, 2007. [ PDF | PowerPoint slides ]
David A. Smith and Jason Eisner. Minimum risk annealing for training log-linear models. In Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics, pages 787-794, 2006. [ PDF ]
Markus Dreyer, David A. Smith, and Noah A. Smith. Vine parsing and minimum risk reranking for speed and precision. In Proceedings of the CoNLL Shared Task, pages 201-205, 2006. [ PDF ]
David A. Smith and Jason Eisner. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings of the HLT-NAACL Workshop on Statistical Machine Translation, pages 23-30, 2006. [ PDF | PowerPoint slides ]
Noah A. Smith, David A. Smith, and Roy W. Tromble. Context-based morphological disambiguation with random fields. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 475-482, 2005. [ PDF ]
David A. Smith and Noah A. Smith. Bilingual parsing with factored estimation: Using English to parse Korean. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 49-56, 2004. [ PDF ]
F.J. Och, D. Gildea, S. Khudanpur, A. Sarkar, K. Yamada, A. Fraser, S. Kumar, L. Shen, D. Smith, K. Eng, V. Jain, Z. Jin, and D. Radev. A smorgasbord of features for statistical machine translation. In Proceedings of the Conference on Human Language Technology and the North American Association for Computational Linguistics, pages 161-168, 2004. [ PDF ]
David A. Smith and Gideon S. Mann. Bootstrapping toponym classifiers. In Proceedings of the HLT-NAACL Workshop on Analysis of Geographic References, pages 45-49, 2003. [ PDF ]
David A. Smith, Anne Mahoney, and Gregory Crane. Integrating harvesting into digital library content. In Proceedings of the 2nd ACM+IEEE Joint Conference on Digital Libraries, pages 183-184, Portland, OR, July 2002. [ PDF ]
David A. Smith. Detecting events with date and place information in unstructured text. In Proceedings of the 2nd ACM+IEEE Joint Conference on Digital Libraries, pages 191-196, Portland, OR, July 2002. [ PDF ]
David A. Smith. Detecting and browsing events in unstructured text. In Proceedings of the 25th Annual ACM SIGIR Conference, pages 73-80, Tampere, Finland, August 2002. [ PDF ]
David A. Smith and Gregory Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the European Conference on Digital Libraries (ECDL), pages 127-136, Darmstadt, Germany, September 2001. [ PDF ]
David A. Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox. Management of XML documents in an integrated digital library. In Proceedings of Extreme Markup Languages 2000, pages 219-224, Montreal, August 2000.
Gregory Crane, Clifford E. Wulfman, Lisa M. Cerrato, Anne Mahoney, Thomas L. Milbank, David Mimno, Jeffrey A. Rydberg-Cox, David A. Smith, and Christopher York. Towards a cultural heritage digital library. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2003, pages 75-86, Houston, TX, June 2003. [ PDF ]
Gregory Crane, David A. Smith, and Clifford E. Wulfman. Building a hypertextual digital library in the humanities: A case study on London. In Proceedings of the First ACM+IEEE Joint Conference on Digital Libraries, pages 426-434, Roanoke, VA, June 2001. Best paper award. [ PDF ]
David Bamman and David A. Smith. Extracting two thousand years of Latin from a million book library. ACM Journal on Computing and Cultural Heritage, 5(1), 2012.
Jangwon Seo, W. Bruce Croft, and David A. Smith. Online community search using conversational structures. Information Retrieval, 14(6):547-571, 2011. [ PDF ]
Andrew Kae, David A. Smith, and Erik Learned-Miller. Learning on the fly: A font-free approach towards multilingual OCR. International Journal on Document Analysis and Recognition, 14(3):289-301, 2011. [ PDF ]
David A. Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox. Management of XML documents in an integrated digital library. Markup Languages: Theory and Practice, 2(3):205-214, 2000. [ PDF ]
David A. Smith, Jeffrey A. Rydberg-Cox, and Gregory R. Crane. The Perseus Project: A digital library for the humanities. Literary and Linguistic Computing, 15(1):15-25, 2000.
David A. Smith. Textual variation and version control in the TEI. Computers and the Humanities, 33(1-2):103-112, 1999.
Gregory R. Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Drudgery and deep thought: Designing a digital library for the humanities. Communications of the Association for Computing Machinery, 44(5):35-40, 2001. [ PDF ]
Xiaoye Wu and David A. Smith. Right-branching tree transformation for eager dependency parsing. Technical Report CIIR-776, University of Massachusetts, 2010. [ PDF ]
Jason Naradowsky, Joe Pater, David Smith, and Robert Staubs. Learning hidden metrical structure with a log-linear model of grammar. In Computational Modelling of Sound Pattern Acquisition, pages 59-60, Edmonton, February 2010. Department of Linguistics, University of Alberta.
Joe Pater, David A. Smith, Robert Staubs, Karen Jesney, and Ramgopal Mettu. Learning hidden structure with a log-linear model of grammar. In Linguistic Society of America (LSA), Baltimore, January 2010.
Gregory Druck and David A. Smith. Computing conditional feature covariance under non-projective tree conditional random fields. Technical Report UM-CS-2009-060, University of Massachusetts, 2009.
David A. Smith. Debabelizing libraries: Machine translation by and for digital collections. D-Lib Magazine, 12(3), March 2006. [ HTML ]
Anne Mahoney, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Generalizing the Perseus XML document manager. In Linguistic Exploration: Workshop on Web-based Language Documentation and Description, Philadelphia, December 2000. [ HTML ]