by Kıvanç Muşlu, Yuriy Brun, Alexandra Meliou
Abstract:
Today, software systems that use data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A user study with 96 humans shows that ConTest is extremely effective at guarding against data entry errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.
Citation:
Kıvanç Muşlu, Yuriy Brun, and Alexandra Meliou, Preventing Data Errors with Continuous Testing, in Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2015, pp. 373–384.
Bibtex:
@inproceedings{Muslu15issta,
author = {K{\i}van{\c{c}} Mu{\c{s}}lu and Yuriy Brun and Alexandra Meliou},
title =
{\href{http://people.cs.umass.edu/brun/pubs/pubs/Muslu15issta.pdf}{Preventing Data Errors with Continuous Testing}},
booktitle = {Proceedings of the ACM SIGSOFT International Symposium on
Software Testing and Analysis (ISSTA)},
venue = {ISSTA},
month = jul,
year = {2015},
date = {12--17},
pages = {373--384},
address = {Baltimore, MD, USA},
doi = {10.1145/2771783.2771792},
accept = {$\frac{33}{119} \approx 28\%$},
abstract = {Today, software systems that use data are ubiquitous, and ensuring the data's
quality is an increasingly important challenge as data errors result in
annual multi-billion dollar losses. While software debugging and testing have
received heavy research attention, less effort has been devoted to data
debugging: identifying system errors caused by well-formed but incorrect
data. We present continuous data testing (CDT), a low-overhead, delay-free
technique that quickly identifies likely data errors. CDT continuously
executes domain-specific test queries; when a test fails, CDT unobtrusively
warns the user or administrator. We implement CDT in the ConTest prototype
for the PostgreSQL database management system. A user study with 96 humans
shows that ConTest is extremely effective at guarding against data entry
errors: With ConTest, users corrected 98.4% of their errors, as opposed to
40.2% without, even when we injected 40% false positives into ConTest's
output. Further, when using ConTest, users corrected data entry errors 3.2
times faster than when using state-of-the-art methods.},
}