Data Debugging with Continuous Testing
by Kıvanç Muşlu, Yuriy Brun, Alexandra Meliou
Abstract:
Today, systems rely as heavily on data as on the software that manipulates those data. Errors in these systems are incredibly costly, annually resulting in multi-billion dollar losses, and, on multiple occasions, in death. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: discovering system errors caused by well-formed but incorrect data. In this paper, we propose continuous data testing: using otherwise-idle CPU cycles to run test queries, in the background, as a user or database administrator modifies a database. This technique notifies the user or administrator about a data bug as quickly as possible after that bug is introduced, leading to at least three benefits: (1) The bug is discovered quickly and can be fixed before it is likely to cause a problem. (2) The bug is discovered while the relevant change is fresh in the user's or administrator's mind, increasing the chance that the underlying cause of the bug, as opposed to only the discovered side-effect, is fixed. (3) When poor documentation or company policies contribute to bugs, discovering the bug quickly is likely to identify these contributing factors, facilitating updating documentation and policies to prevent similar bugs in the future. We describe the problem space and potential benefits of continuous data testing, our vision for the technique, challenges we encountered, and our prototype implementation for PostgreSQL. The prototype's low overhead shows promise that continuous data testing can address the important problem of data debugging.
Citation:
Kıvanç Muşlu, Yuriy Brun, and Alexandra Meliou, Data Debugging with Continuous Testing, in Proceedings of the New Ideas Track at the 9th Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE13), 2013, pp. 631–634.
Bibtex:
@inproceedings{Muslu13ni-fse,
  author = {K{\i}van{\c{c}} Mu{\c{s}}lu and Yuriy Brun and Alexandra Meliou},
  title = {\href{http://people.cs.umass.edu/ameli/projects/dataTesting/papers/Muslu13ni-fse.pdf}{Data
  Debugging with Continuous Testing}},
  booktitle = {Proceedings of the New Ideas Track at the 9th Joint Meeting of
  the European Software Engineering Conference and ACM SIGSOFT Symposium on the
  Foundations of Software Engineering ({ESEC/FSE}13)},
  venue = {ESEC/FSE},
  month = {August},
  year = {2013},
  date = {18--26},
  pages = {631--634},
  address = {Saint Petersburg, Russia},
  accept = {$\frac{12}{33} \approx 36\%$},

  abstract = {Today, systems rely as heavily on data as on the
  software that manipulates those data. Errors in these systems are
  incredibly costly, annually resulting in multi-billion dollar
  losses, and, on multiple occasions, in death. While software
  debugging and testing have received heavy research attention, less
  effort has been devoted to data debugging: discovering system errors
  caused by well-formed but incorrect data. In this paper, we propose
  continuous data testing: using otherwise-idle CPU cycles to run test
  queries, in the background, as a user or database administrator
  modifies a database. This technique notifies the user or
  administrator about a data bug as quickly as possible after that bug
  is introduced, leading to at least three benefits: (1) The bug is
  discovered quickly and can be fixed before it is likely to cause a
  problem. (2) The bug is discovered while the relevant change is
  fresh in the user's or administrator's mind, increasing the chance
  that the underlying cause of the bug, as opposed to only the
  discovered side-effect, is fixed. (3) When poor documentation or
  company policies contribute to bugs, discovering the bug quickly is
  likely to identify these contributing factors, facilitating updating
  documentation and policies to prevent similar bugs in the future. We
  describe the problem space and potential benefits of continuous data
  testing, our vision for the technique, challenges we encountered,
  and our prototype implementation for PostgreSQL. The prototype's low
  overhead shows promise that continuous data testing can address the
  important problem of data debugging.},
  doi = {10.1145/2491411.2494580},
}