Tracing Data Errors with View-Conditioned Causality
by Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, Dan Suciu
Abstract:
A surprising query result is often an indication of errors in the query or the underlying data. Recent work suggests using causal reasoning to find explanations for the surprising result. In practice, however, one often has multiple queries and/or multiple answers, some of which may be considered correct and others unexpected. In this paper, we focus on determining the causes of a set of unexpected results, possibly conditioned on some prior knowledge of the correctness of another set of results. We call this problem View-Conditioned Causality. We adapt the definitions of causality and responsibility for the case of multiple answers/views and provide a non-trivial algorithm that reduces the problem of finding causes and their responsibility to a satisfiability problem that can be solved with existing tools. We evaluate both the accuracy and effectiveness of our approach on a real dataset of user-generated mobile device tracking data, and demonstrate that it can identify causes of error more effectively than static Boolean influence and alternative notions of causality.
Citation:
Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, and Dan Suciu, Tracing Data Errors with View-Conditioned Causality, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2011, pp. 505–516.
Bibtex:
@inproceedings{DBLP:conf/sigmod/MeliouGNS11,
    Abstract = {A surprising query result is often an indication of errors in
    the query or the underlying data. Recent work suggests using causal
    reasoning to find explanations for the surprising result. In practice,
    however, one often has multiple queries and/or multiple answers, some of
    which may be considered correct and others unexpected. In this paper, we
    focus on determining the causes of a set of unexpected results, possibly
    conditioned on some prior knowledge of the correctness of another set of
    results. We call this problem View-Conditioned Causality. We adapt the
    definitions of causality and responsibility for the case of multiple
    answers/views and provide a non-trivial algorithm that reduces the problem
    of finding causes and their responsibility to a satisfiability problem
    that can be solved with existing tools. We evaluate both the accuracy and
    effectiveness of our approach on a real dataset of user-generated mobile
    device tracking data, and demonstrate that it can identify causes of error
    more effectively than static Boolean influence and alternative notions of
    causality.},
    Author = {Alexandra Meliou and Wolfgang Gatterbauer and Suman Nath and Dan Suciu},
    Booktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD)},
    Pages = {505--516},
    doi = {10.1145/1989323.1989376},
    Title = {\href{http://people.cs.umass.edu/ameli/projects/causality/papers/sigmod320-Meliou.pdf}{Tracing Data Errors with View-Conditioned Causality}},
    Venue = {SIGMOD},
    address = {Athens, Greece},
    month = jun,
    Year = {2011}
}