by Kıvanç Muşlu, Yuriy Brun, Alexandra Meliou
Abstract:
Today, systems rely as heavily on data as on the software that manipulates those data. Errors in these systems are incredibly costly, annually resulting in multi-billion dollar losses, and, on multiple occasions, in death. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: discovering system errors caused by well-formed but incorrect data. In this paper, we propose continuous data testing: using otherwise-idle CPU cycles to run test queries, in the background, as a user or database administrator modifies a database. This technique notifies the user or administrator about a data bug as quickly as possible after that bug is introduced, leading to at least three benefits: (1) The bug is discovered quickly and can be fixed before it is likely to cause a problem. (2) The bug is discovered while the relevant change is fresh in the user's or administrator's mind, increasing the chance that the underlying cause of the bug, as opposed to only the discovered side-effect, is fixed. (3) When poor documentation or company policies contribute to bugs, discovering the bug quickly is likely to identify these contributing factors, facilitating updating documentation and policies to prevent similar bugs in the future. We describe the problem space and potential benefits of continuous data testing, our vision for the technique, challenges we encountered, and our prototype implementation for PostgreSQL. The prototype's low overhead shows promise that continuous data testing can address the important problem of data debugging.
Citation:
Kıvanç Muşlu, Yuriy Brun, and Alexandra Meliou, Data Debugging with Continuous Testing, in Proceedings of the New Ideas Track at the 9th Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), 2013, pp. 631–634.
Bibtex:
@inproceedings{Muslu13ni-fse,
author = {K{\i}van{\c{c}} Mu{\c{s}}lu and Yuriy Brun and Alexandra Meliou},
title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Muslu13ni-fse.pdf}{Data
Debugging with Continuous Testing}},
booktitle = {Proceedings of the New Ideas Track at the 9th Joint Meeting of
the European Software Engineering Conference and ACM SIGSOFT Symposium on the
Foundations of Software Engineering (ESEC/FSE)},
venue = {ESEC/FSE NI},
month = {August},
year = {2013},
date = {18--26},
address = {Saint Petersburg, Russia},
accept = {$\frac{12}{33} \approx 36\%$},
pages = {631--634},
doi = {10.1145/2491411.2494580},
note = {\href{https://doi.org/10.1145/2491411.2494580}{DOI: 10.1145/2491411.2494580}},
abstract = {Today, systems rely as heavily on data as on the
software that manipulates those data. Errors in these systems are
incredibly costly, annually resulting in multi-billion dollar
losses, and, on multiple occasions, in death. While software
debugging and testing have received heavy research attention, less
effort has been devoted to data debugging: discovering system errors
caused by well-formed but incorrect data. In this paper, we propose
continuous data testing: using otherwise-idle CPU cycles to run test
queries, in the background, as a user or database administrator
modifies a database. This technique notifies the user or
administrator about a data bug as quickly as possible after that bug
is introduced, leading to at least three benefits: (1) The bug is
discovered quickly and can be fixed before it is likely to cause a
problem. (2) The bug is discovered while the relevant change is
fresh in the user's or administrator's mind, increasing the chance
that the underlying cause of the bug, as opposed to only the
discovered side-effect, is fixed. (3) When poor documentation or
company policies contribute to bugs, discovering the bug quickly is
likely to identify these contributing factors, facilitating updating
documentation and policies to prevent similar bugs in the future. We
describe the problem space and potential benefits of continuous data
testing, our vision for the technique, challenges we encountered,
and our prototype implementation for PostgreSQL. The prototype's low
overhead shows promise that continuous data testing can address the
important problem of data debugging.},
}