David Jensen
                photo David Jensen
Director, Knowledge Discovery Laboratory
College of Information & Computer Sciences
University of Massachusetts Amherst


I am a faculty member in the College of Information and Computer Sciences at the University of Massachusetts Amherst.  I direct the Knowledge Discovery Laboratory, which I founded in 2000.  I also serve as the Associate Director of the Computational Social Science Institute, an interdisciplinary effort at UMass to study social phenomena using computational tools and concepts.  From 1991 to 1995, I served as an analyst with the Office of Technology Assessment, an agency of the United States Congress.  I received my doctoral degree from Washington University in St. Louis in 1992.

My current research focuses on machine learning and data science for analyzing large social, technological, and computational systems.  In particular, my work focuses on methods for constructing accurate causal models from observational and experimental data, with applications to social science, fraud detection, security, and systems management.  My research is supported by many organizations, including the National Science Foundation, the Defense Advanced Research Projects Agency, and the Intelligence Advanced Research Projects Activity.

I regularly serve on program committees for several conferences, including the International Conference on Machine Learning, the Conference on Uncertainty in Artificial Intelligence, the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, and the IEEE International Conference on Data Mining.  I have also served on the Board of Directors of the ACM Special Interest Group on Knowledge Discovery and Data Mining (2005-2013), the Defense Science Study Group (2006-2007), and DARPA's Information Science and Technology Group (2007-2012).  In 2011, I received the Outstanding Teacher Award from the UMass College of Natural Sciences.  In 2017, one of my papers received the IEEE INFOCOM Test of Time Paper Award.



Below are selected papers and talks about my current and past research.  For additional information on publications, see Google Scholar, Academia.edu, ResearchGate, and my research group’s web pages.

Causal Modeling

Inferring causal effects in relational data. David Arbour, Daniel Garant, and David Jensen (2016). SIGKDD.

Inferring causal direction from relational data. David Arbour, Katerina Marazopoulou, and David Jensen (2016). UAI.

Evaluating causal models by comparing interventional distributions. Daniel Garant and David Jensen (2016). SIGKDD Workshop on Causation.

Learning the structure of causal models with relational and temporal dependence. Katerina Marazopoulou, Marc Maier, and David Jensen (2015). UAI.

Reasoning about independence in probabilistic models of relational data. Marc Maier, Katerina Marazopoulou, and David Jensen (2014). arXiv:1302.4381.

A sound and complete algorithm for learning causal models from relational data. Marc Maier, Katerina Marazopoulou, David Arbour, and David Jensen (2013). UAI.

Learning causal models of relational domains. Marc Maier, Brian Taylor, Huseyin Oktay, and David Jensen (2010). AAAI.

Relational blocking for causal discovery. Matthew Rattigan, Marc Maier, and David Jensen (2011). AAAI.

Automatic identification of quasi-experimental designs for discovering causal knowledge. David Jensen, Andrew Fast, Brian Taylor, and Marc Maier (2008).  SIGKDD.

Computational social science. David Jensen (2010). SIGKDD Keynote Address.

Statistical Relational Learning

Relational dependency networks. Jennifer Neville and David Jensen (2007). JMLR.

Why collective inference improves relational classification. David Jensen, Jennifer Neville, and Brian Gallagher (2004). SIGKDD.

Why stacked models perform effective collective classification. Andrew Fast and David Jensen (2008). ICDM.

Learning relational probability trees. Jennifer Neville, David Jensen, Lisa Friedland, and Michael Hay (2003). SIGKDD.

Simple estimators for relational Bayesian classifiers. Jennifer Neville, David Jensen, and Brian Gallagher (2003). ICDM.

Linkage and autocorrelation cause feature selection bias in relational learning. David Jensen and Jennifer Neville (2002). ICML.

Leveraging relational autocorrelation with latent group models. Jennifer Neville and David Jensen (2005). ICDM.

Navigation and Routing in Networks

Navigating networks by using homophily and degree.  Özgur Şimşek and David Jensen (2008). PNAS.

Using structure indices for efficient approximation of network properties. Matthew Rattigan, Marc Maier, and David Jensen (2006). SIGKDD.

Indexing network structure with shortest-path trees. Marc Maier, Matthew Rattigan, and David Jensen (2011). ACM TKDD.

MaxProp: Routing for vehicle-based disruption-tolerant networks. John Burgess, Brian Gallagher, David Jensen, and Brian Levine (2006). INFOCOM.

Creating social networks to improve peer-to-peer networking. Andrew Fast, David Jensen, and Brian Levine (2005). SIGKDD.

Privacy and Networks

Resisting structural re-identification in anonymized social networks. Michael Hay, Gerome Miklau, David Jensen, Don Towsley, and Philipp Weis (2008).  PVLDB.

Accurate estimation of the degree distribution of private networks. Michael Hay, Chao Li, Gerome Miklau, and David Jensen (2009). ICDM.

Privacy vulnerabilities in encrypted HTTP streams. George Bissias, Marc Liberatore, David Jensen, and Brian Levine (2006). PET.

Fraud Detection and Security

Using relational knowledge discovery to prevent securities fraud. Jennifer Neville, Özgur Şimşek, David Jensen, John Komoroske, Kelly Palmer, and Henry Goldberg (2005). SIGKDD.

Detecting insider threats in a real corporate database of computer usage activity. Ted Senator, Henry Goldberg, Alex Memory, [27 other authors]...Daniel Corkill, Lisa Friedland, Amanda Gentzel, and David Jensen (2013). SIGKDD.

Citation Analysis

Exploiting relational structure to understand publication patterns in high-energy physics. Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen (2003). SIGKDD Explorations.

Recommending citations for academic papers. Trevor Strohman, W. Bruce Croft, and David Jensen (2007). SIGIR.

Social Media Analysis

Causal discovery in social media using quasi-experimental designs. Huseyin Oktay, Brian Taylor, and David Jensen (2010). SIGKDD Workshop.

Online dating recommendations: Matching markets and learning preferences. Kun Tu, Bruno Ribeiro, David Jensen, Don Towsley, Benyuan Liu, Hua Jiang, and Xiaodong Wang (2014).  WWW Workshop.

Overfitting and Multiple Comparisons

Multiple comparisons in induction algorithms. David Jensen and Paul Cohen (2000). MLJ.

The effects of training set size on decision tree complexity.  Tim Oates and David Jensen (1997). ICML.


Recent and upcoming courses include:

Research Methods in Empirical Computer Science (CMPSCI 691DD), Spring 2017
Introduction to Knowledge Discovery (CMPSCI 348), Spring 2015-2017.
Reasoning Under Uncertainty (CMPSCI 240), Fall 2014
Artificial Intelligence (CMPSCI 383), Fall 2013

Graduated students

David Arbour (PhD, 2017 (expected)) — Facebook
Lisa Friedland (PhD, 2016) — Northeastern University
Brian Taylor (PhD, 2015) — Amazon
Daniel Garant (MS, 2015) — C&S Wholesale Grocers
Lissa Baseman (MS, 2015) — Los Alamos National Laboratory
Marc Maier (PhD, 2014) — MassMutual
Phillip Kirlin (PhD, 2014) — Rhodes College
Matthew Rattigan (PhD, 2012) — University of Massachusetts Amherst
Andrew Fast (PhD, 2010) — Elder Research
Michael Hay (PhD, 2010) — Colgate University
Brian Gallagher (MS, 2004) — Lawrence Livermore National Laboratory
Ross Fairgrieve (MS, 2004) — Tumblr
Jennifer Neville (PhD, 2006) — Purdue University