Debugging Distributed Systems
by Ivan Beschastnikh, Patty Wang, Yuriy Brun, Michael D. Ernst
Abstract:
Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system's communication topology can be difficult. A standard approach to gaining insight into system activity is to analyze system logs. Unfortunately, this can be a tedious and complex process. This article looks at several key features and debugging challenges that differentiate distributed systems from other kinds of software. The article presents several promising tools and ongoing research to help resolve these challenges.
Citation:
Ivan Beschastnikh, Patty Wang, Yuriy Brun, and Michael D. Ernst, Debugging Distributed Systems, Communications of the ACM, vol. 59, no. 8, August 2016, pp. 32–37.
Related:
A previous version appeared in ACM Queue 14(2):91--110, March/April 2016.
Bibtex:
@article{Beschastnikh16cacm,
  author = {Ivan Beschastnikh and Patty Wang and Yuriy Brun and Michael D. Ernst},
  title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Beschastnikh16cacm.pdf}{Debugging Distributed Systems}},
  journal = {Communications of the ACM},
  venue = {CACM},
  year = {2016},
  month = {August},
	volume = {59},
	number = {8},
	pages = {32--37},
  doi = {10.1145/2909480},
  note = {A previous version appeared in ACM Queue 14(2):91--110, March/April 2016,
	\href{http://dx.doi.org/10.1145/2909480}{DOI: 10.1145/2909480}},
  previous = {A previous version appeared in ACM Queue 14(2):91--110,
  March/April 2016.},
	
  abstract = {Distributed systems pose unique challenges for software
  developers. Reasoning about concurrent activities of system nodes and even
  understanding the system's communication topology can be difficult. A
  standard approach to gaining insight into system activity is to analyze
  system logs. Unfortunately, this can be a tedious and complex process. This
  article looks at several key features and debugging challenges that
  differentiate distributed systems from other kinds of software. The article
  presents several promising tools and ongoing research to help resolve these
  challenges.},

  fundedBy = {NSF CCF-1453474, NSF CNS-1513055, DARPA FA8750-12-2-0107},
}