Fusing Data with Correlations
by Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, Divesh Srivastava
Abstract:
A surprising query result is often an indication of errors in the query or the underlying data. Recent work suggests using causal reasoning to find explanations for the surprising result. In practice, however, one often has multiple queries and/or multiple answers, some of which may be considered correct and others unexpected. In this paper, we focus on determining the causes of a set of unexpected results, possibly conditioned on some prior knowledge of the correctness of another set of results. We call this problem ViewConditioned Causality. We adapt the definitions of causality and responsibility for the case of multiple answers/views and provide a non-trivial algorithm that reduces the problem of finding causes and their responsibility to a satisfiability problem that can be solved with existing tools. We evaluate both the accuracy and effectiveness of our approach on a real dataset of user-generated mobile device tracking data, and demonstrate that it can identify causes of error more effectively than static Boolean influence and alternative notions of causality.
Citation:
Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava, Fusing Data with Correlations, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2014, pp. 433–444.
Bibtex:
@inproceedings{DBLP:conf/sigmod/PochampallyDDMS14,
    Abstract = {A surprising query result is often an indication of errors in
    the query or the underlying data. Recent work suggests using causal
    reasoning to find explanations for the surprising result. In practice,
    however, one often has multiple queries and/or multiple answers, some of
    which may be considered correct and others unexpected. In this paper, we
    focus on determining the causes of a set of unexpected results, possibly
    conditioned on some prior knowledge of the correctness of another set of
    results. We call this problem ViewConditioned Causality. We adapt the
    definitions of causality and responsibility for the case of multiple
    answers/views and provide a non-trivial algorithm that reduces the problem
    of finding causes and their responsibility to a satisfiability problem
    that can be solved with existing tools. We evaluate both the accuracy and
    effectiveness of our approach on a real dataset of user-generated mobile
    device tracking data, and demonstrate that it can identify causes of error
    more effectively than static Boolean influence and alternative notions of
    causality.},
    Author = {Pochampally, Ravali and Das Sarma, Anish and Dong, Xin Luna and Meliou, Alexandra and Srivastava, Divesh},
    Booktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD)},
    Title = {\href{http://people.cs.umass.edu/ameli/projects/dataIntegration/papers/corrFusion-SIGMOD2014.pdf}{Fusing Data with Correlations}},
    Venue = {SIGMOD},
    address = {Snowbird, Utah},
    month = {June},
    Year = {2014},
    pages = {433--444},
    doi = {10.1145/2588555.2593674},
}