David Jensen
              photo David Jensen
Director, Knowledge Discovery Laboratory
College of Information & Computer Sciences
University of Massachusetts Amherst


I am a faculty member in the College of Information and Computer Sciences at the University of Massachusetts Amherst.  I direct the Knowledge Discovery Laboratory, which I founded in 2000.  I also serve as the Associate Director of the Computational Social Science Institute, an interdisciplinary effort at UMass to study social phenomena using computational tools and concepts.  From 1991 to 1995, I served as an analyst with the Office of Technology Assessment, an agency of the United States Congress.  I received my doctoral degree from Washington University in St. Louis in 1992.

My current research focuses on machine learning and data science for analyzing large social, technological, and computational systems.  In particular, my work focuses on methods for constructing accurate causal models from observational and experimental data, with applications to social science, fraud detection, security, and systems management.  My research is supported by many organizations, including the National Science Foundation, the Defense Advanced Research Projects Agency, and the Intelligence Advanced Research Projects Activity.

I regularly serve on program committees for several conferences, including the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, the IEEE International Conference on Data Mining, the International Conference on Machine Learning, and the Conference on Uncertainty in Artificial Intelligence.  I have also served on the Board of Directors of the ACM Special Interest Group on Knowledge Discovery and Data Mining (2005-2013), the Defense Science Study Group (2006-2007), and DARPA's Information Science and Technology Group (2007-2012).  In 2011, I received the Outstanding Teacher Award from the UMass College of Natural Sciences.


  • I will present a brief invited talk and participate in a panel discussion at the DARPA ISAT Workshop on Machine Learning for Causal Inference in Cambridge (2/11-12/2016).

  • Kaleigh Clary was prominently featured in a New York Times article on the “Hack the Dinos” Challenge at the American Museum of Natural History. (1/19/2016)

  • Daniel Grant successfully passed his portfolio review and advanced to candidacy in our PhD program. (12/9/2015)

  • Lisa Friedland successfully defended her dissertation. (12/8/2015)

  • Matthew Rattigan (PhD 2012) was featured in an article by the UMass Alumni Association. (12/4/2015)

  • I was promoted to Professor. (9/1/2015)


Below are selected papers and talks about my current and past research.  For additional information on publications, see my research group’s web pages or relevant pages at Google Scholar, Academia.edu, ResearchGate, and LinkedIn.

Causal Modeling

Learning the structure of causal models with relational and temporal dependence. Katerina Marazopoulou, Marc Maier, and David Jensen (2015). UAI.

Reasoning about independence in probabilistic models of relational data. Marc Maier, Katerina Marazopoulou, and David Jensen (2014). arXiv:1302.4381.

A sound and complete algorithm for learning causal models from relational data. Marc Maier, Katerina Marazopoulou, David Arbour, and David Jensen (2013). UAI.

Learning causal models of relational domains. Marc Maier, Brian Taylor, Huseyin Oktay, and David Jensen (2010). AAAI.

Relational blocking for causal discovery. Matthew Rattigan, Marc Maier, and David Jensen (2011). AAAI.

Automatic identification of quasi-experimental designs for discovering causal knowledge. David Jensen, Andrew Fast, Brian Taylor, and Marc Maier (2008).  SIGKDD.

Computational social science. David Jensen (2010). SIGKDD Keynote Address.

Statistical Relational Learning

Relational dependency networks. Jennifer Neville and David Jensen (2007). JMLR.

Why collective inference improves relational classification. David Jensen, Jennifer Neville, and Brian Gallagher (2004). SIGKDD.

Why stacked models perform effective collective classification. Andrew Fast and David Jensen (2008). ICDM.

Learning relational probability trees. Jennifer Neville, David Jensen, Lisa Friedland, and Michael Hay (2003). SIGKDD.

Simple estimators for relational Bayesian classifiers. Jennifer Neville, David Jensen, and Brian Gallagher (2003). ICDM.

Linkage and autocorrelation cause feature selection bias in relational learning. David Jensen and Jennifer Neville (2002). ICML.

Leveraging relational autocorrelation with latent group models. Jennifer Neville and David Jensen (2005). ICDM.

Navigation and Routing in Networks

Navigating networks by using homophily and degree.  Özgur Şimşek and David Jensen (2008). PNAS.

Using structure indices for efficient approximation of network properties. Matthew Rattigan, Marc Maier, and David Jensen (2006). SIGKDD.

Indexing network structure with shortest-path trees. Marc Maier, Matthew Rattigan, and David Jensen (2011). ACM TKDD.

MaxProp: Routing for vehicle-based disruption-tolerant networks. John Burgess, Brian Gallagher, David Jensen, and Brian Levine (2006). INFOCOM.

Creating social networks to improve peer-to-peer networking. Andrew Fast, David Jensen, and Brian Levine (2005). SIGKDD.

Privacy and Networks

Resisting structural re-identification in anonymized social networks. Michael Hay, Gerome Miklau, David Jensen, Don Towsley, and Philipp Weis (2008).  PVLDB.

Accurate estimation of the degree distribution of private networks. Michael Hay, Chao Li, Gerome Miklau, and David Jensen (2009). ICDM.

Privacy vulnerabilities in encrypted HTTP streams. George Bissias, Marc Liberatore, David Jensen, and Brian Levine (2006). PET.

Fraud Detection and Security

Using relational knowledge discovery to prevent securities fraud. Jennifer Neville, Özgur Şimşek, David Jensen, John Komoroske, Kelly Palmer, and Henry Goldberg (2005). SIGKDD.

Detecting insider threats in a real corporate database of computer usage activity. Ted Senator, Henry Goldberg, Alex Memory, [27 other authors]...Daniel Corkill, Lisa Friedland, Amanda Gentzel, and David Jensen (2013). SIGKDD.

Citation Analysis

Exploiting relational structure to understand publication patterns in high-energy physics. Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen (2003). SIGKDD Explorations.

Recommending citations for academic papers. Trevor Strohman, W. Bruce Croft, and David Jensen (2007). SIGIR.

Social Media Analysis

Causal discovery in social media using quasi-experimental designs. Huseyin Oktay, Brian Taylor, and David Jensen (2010). SIGKDD Workshop.

Online dating recommendations: Matching markets and learning preferences. Kun Tu, Bruno Ribeiro, David Jensen, Don Towsley, Benyuan Liu, Hua Jiang, and Xiaodong Wang (2014).  WWW Workshop.

Overfitting and Multiple Comparisons

Multiple comparisons in induction algorithms. David Jensen and Paul Cohen (2000). MLJ.

The effects of training set size on decision tree complexity.  Tim Oates and David Jensen (1997). ICML.


Recent courses include:

Introduction to Knowledge Discovery (CMPSCI 348), Spring 2015 & 2016.

Reasoning Under Uncertainty (CMPSCI 240), Fall 2014

Research Methods in Empirical Computer Science (CMPSCI 691DD), Spring 2014

Artificial Intelligence (CMPSCI 383), Fall 2013