by Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, Dan Suciu
Abstract:
A surprising query result is often an indication of errors in the query or the underlying data. Recent work suggests using causal reasoning to find explanations for the surprising result. In practice, however, one often has multiple queries and/or multiple answers, some of which may be considered correct and others unexpected. In this paper, we focus on determining the causes of a set of unexpected results, possibly conditioned on some prior knowledge of the correctness of another set of results. We call this problem View-Conditioned Causality. We adapt the definitions of causality and responsibility for the case of multiple answers/views and provide a non-trivial algorithm that reduces the problem of finding causes and their responsibility to a satisfiability problem that can be solved with existing tools. We evaluate both the accuracy and effectiveness of our approach on a real dataset of user-generated mobile device tracking data, and demonstrate that it can identify causes of error more effectively than static Boolean influence and alternative notions of causality.
Citation:
Alexandra Meliou, Wolfgang Gatterbauer, Suman Nath, and Dan Suciu, Tracing Data Errors with View-Conditioned Causality, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2011, pp. 505–516.
Bibtex:
@inproceedings{DBLP:conf/sigmod/MeliouGNS11,
Abstract = {A surprising query result is often an indication of errors in
the query or the underlying data. Recent work suggests using causal
reasoning to find explanations for the surprising result. In practice,
however, one often has multiple queries and/or multiple answers, some of
which may be considered correct and others unexpected. In this paper, we
focus on determining the causes of a set of unexpected results, possibly
conditioned on some prior knowledge of the correctness of another set of
results. We call this problem View-Conditioned Causality. We adapt the
definitions of causality and responsibility for the case of multiple
answers/views and provide a non-trivial algorithm that reduces the problem
of finding causes and their responsibility to a satisfiability problem
that can be solved with existing tools. We evaluate both the accuracy and
effectiveness of our approach on a real dataset of user-generated mobile
device tracking data, and demonstrate that it can identify causes of error
more effectively than static Boolean influence and alternative notions of
causality.},
Author = {Alexandra Meliou and Wolfgang Gatterbauer and Suman Nath and Dan Suciu},
Booktitle = {Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD)},
Pages = {505--516},
doi = {10.1145/1989323.1989376},
Title = {\href{http://people.cs.umass.edu/ameli/projects/causality/papers/sigmod320-Meliou.pdf}{Tracing Data Errors with View-Conditioned Causality}},
Venue = {SIGMOD},
address = {Athens, Greece},
month = jun,
Year = {2011}
}