Visualizing distributed system executions
by Ivan Beschastnikh, Perry Liu, Albert Xing, Patty Wang, Yuriy Brun, Michael D. Ernst
Abstract:
Distributed systems pose unique challenges for software developers. Understanding the system's communication topology and reasoning about concurrent activities of system hosts can be difficult. The standard approach, analyzing system logs, can be a tedious and complex process that involves reconstructing a system log from multiple hosts' logs, reconciling timestamps among hosts with non-synchronized clocks, and understanding what took place during the execution encoded by the log. This paper presents a novel approach for tackling three tasks frequently performed during analysis of distributed system executions: (1) understanding the relative ordering of events, (2) searching for specific patterns of interaction between hosts, and (3) identifying structural similarities and differences between pairs of executions. Our approach consists of XVector, which instruments distributed systems to capture partial ordering information that encodes the happens-before relation between events, and ShiViz, which processes the resulting logs and presents distributed system executions as interactive time-space diagrams. Two user studies with a total of 109 students and a case study with 2 developers showed that our method was effective, helping participants answer statistically significantly more system-comprehension questions correctly, with a very large effect size.
Citation:
Ivan Beschastnikh, Perry Liu, Albert Xing, Patty Wang, Yuriy Brun, and Michael D. Ernst, Visualizing distributed system executions, ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 29, no. 2, March 2020, pp. 9:1–9:38.
Bibtex:
@article{Beschastnikh20tosem,
  author = {Ivan Beschastnikh and Perry Liu and Albert Xing and Patty Wang and Yuriy Brun and Michael D. Ernst},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Beschastnikh20tosem.pdf}{Visualizing distributed system executions}},
  journal = {ACM Transactions on Software Engineering and Methodology (TOSEM)},
  venue = {TOSEM},
  year = {2020},
  volume = {29},
  number = {2},
  month = {March},
  pages = {9:1--9:38},
  issn = {1049-331X},

  doi = {10.1145/3375633},
  note = {\href{https://doi.org/10.1145/3375633}{DOI:
  10.1145/3375633}},
	
  abstract = {Distributed systems pose unique challenges for software developers.
  Understanding the system's communication topology and reasoning about
  concurrent activities of system hosts can be difficult. The standard
  approach, analyzing system logs, can be a tedious and complex process that
  involves reconstructing a system log from multiple hosts' logs, reconciling
  timestamps among hosts with non-synchronized clocks, and understanding what
  took place during the execution encoded by the log.
  This paper presents a novel approach for tackling three tasks frequently
  performed during analysis of distributed system executions: 
  (1) understanding the relative ordering of events,
  (2) searching for specific patterns of interaction between hosts, and
  (3) identifying structural similarities and differences between pairs
  of executions. Our approach consists of XVector, which instruments
  distributed systems to capture partial ordering information that encodes the
  happens-before relation between events, and ShiViz, which processes
  the resulting logs and presents distributed system executions as interactive
  time-space diagrams. Two user studies with a total of 109 students and a case
  study with 2 developers showed that our method was effective, helping
  participants answer statistically significantly more system-comprehension
  questions correctly, with a very large effect size.},

  fundedBy = {AFOSR FA8750-12-2-0107, AFOSR FA8750-15-C-0010, NSF CCF-1453474, 
  NSF-1763423, NSERC Discovery grant, NSERC USRA program}, 
}