by Kıvanç Muşlu, Yuriy Brun, Alexandra Meliou
Abstract:
Today, software systems that rely on data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A feasibility user study with 96 humans shows that ConTest was extremely effective in a setting with a data entry application at guarding against data errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.
Citation:
Kıvanç Muşlu, Yuriy Brun, and Alexandra Meliou, Preventing Data Errors with Continuous Testing, in Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2015, pp. 373–384.
Related:
Extended and revised version of "Data
Debugging with Continuous Testing" in ESEC-FSE NI 2013.
Bibtex:
@inproceedings{Muslu15issta,
author = {K{\i}van{\c{c}} Mu{\c{s}}lu and Yuriy Brun and Alexandra Meliou},
title =
{\href{http://people.cs.umass.edu/brun/pubs/pubs/Muslu15issta.pdf}{Preventing
Data Errors with Continuous Testing}},
booktitle = {Proceedings of the ACM SIGSOFT International Symposium on
Software Testing and Analysis (ISSTA)},
venue = {ISSTA},
month = {July},
year = {2015},
date = {12--17},
pages = {373--384},
address = {Baltimore, MD, USA},
doi = {10.1145/2771783.2771792},
previous = {Extended and revised version of "Data
Debugging with Continuous Testing" in ESEC-FSE NI 2013.},
note = {Extended and revised version of~\ref{Muslu13ni-fse}.
\href{https://doi.org/10.1145/2771783.2771792}{DOI: 10.1145/2771783.2771792}},
accept = {$\frac{33}{119} \approx 28\%$},
abstract = {Today, software systems that rely on data are ubiquitous, and
ensuring the data's quality is an increasingly important challenge as data
errors result in annual multi-billion dollar losses. While software
debugging and testing have received heavy research attention, less effort
has been devoted to data debugging: identifying system errors caused by
well-formed but incorrect data. We present continuous data testing (CDT), a
low-overhead, delay-free technique that quickly identifies likely data
errors. CDT continuously executes domain-specific test queries; when a test
fails, CDT unobtrusively warns the user or administrator. We implement CDT
in the ConTest prototype for the PostgreSQL database management system. A
feasibility user study with 96 humans shows that ConTest was extremely
effective in a setting with a data entry application at guarding against
data errors: With ConTest, users corrected 98.4% of their errors, as
opposed to 40.2% without, even when we injected 40% false positives into
ConTest's output. Further, when using ConTest, users corrected data entry
errors 3.2 times faster than when using state-of-the-art methods.},
fundedBy = {NSF CCF-1349784, NSF IIS-1421322, NSF CCF-1446683,
NSF CCF-1453474, Google Inc. via the Faculty Research Award,
Microsoft Research via a SEIF award},
}