Automatic Mining of Specifications from Invocation Traces and Method Invariants
by Ivo Krka, Yuriy Brun, Nenad Medvidovic
Abstract:
Software library documentation often describes individual methods' APIs, but not the intended protocols and method interactions. This can lead to library misuse, and restrict runtime detection of protocol violations and automated verification of software that uses the library. Specification mining, if accurate, can help mitigate these issues, which has led to significant research into new model-inference techniques that produce FSM-based models from program invariants and execution traces. However, there is currently a lack of empirical studies that, in a principled way, measure the impact of the inference strategies on model quality. To this end, we identify four such strategies and systematically study the quality of the models they produce for nine off-the-shelf, real-world libraries. We find that (1) using invariants to infer an initial model significantly improves model quality, increasing precision by 4% and recall by 41%, on average; (2) effective invariant filtering is crucial for quality and scalability of strategies that use invariants; and (3) using traces in combination with invariants greatly improves robustness to input noise. We present our empirical evaluation, implement new and extend existing model-inference techniques, and make public our implementations, subject libraries, ground-truth models, and experimental data. Our work can lead to higher-quality model inference, and directly improve the name techniques and tools that rely on model, specification, and API inference.
Citation:
Ivo Krka, Yuriy Brun, and Nenad Medvidovic, Automatic Mining of Specifications from Invocation Traces and Method Invariants, in Proceedings of the 22nd ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), 2014, pp. 178–189.
Related:
A previous version appeared as University of Southern California, Center for Software Engineering technical report USC-CSSE-2013-509.
Bibtex:
@inproceedings{Krka14fse,
  author = {Ivo Krka and Yuriy Brun and Nenad Medvidovic},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Krka14fse.pdf}{Automatic
  Mining of Specifications from Invocation Traces
  and Method Invariants}},
  booktitle = {Proceedings of the 22nd ACM SIGSOFT Symposium on the
  Foundations of Software Engineering (FSE)},
  venue = {FSE},
  month = {November},
  year = {2014},
  date = {16--22},
  address = {Hong Kong, China},
  accept = {$\frac{61}{273} \approx 22\%$},
  pages = {178--189},
  doi = {10.1145/2635868.2635890},
  
  note = {A previous version appeared as University of Southern
  California, Center for Software Engineering technical report
  USC-CSSE-2013-509.
  \href{https://doi.org/10.1145/2635868.2635890}{DOI:
  10.1145/2635868.2635890}},
  previous = {A previous version appeared as University of Southern
  California, Center for Software Engineering technical report
  USC-CSSE-2013-509.},

  abstract = {Software library documentation often describes
  individual methods' APIs, but not the intended protocols and method
  interactions. This can lead to library misuse, and restrict runtime
  detection of protocol violations and automated verification of
  software that uses the library. Specification mining, if accurate,
  can help mitigate these issues, which has led to significant
  research into new model-inference techniques that produce FSM-based
  models from program invariants and execution traces. However, there
  is currently a lack of empirical studies that, in a principled way,
  measure the impact of the inference strategies on model quality. To
  this end, we identify four such strategies and systematically study
  the quality of the models they produce for nine off-the-shelf,
  real-world libraries. We find that (1) using invariants to infer an
  initial model significantly improves model quality, increasing
  precision by 4% and recall by 41%, on average; (2) effective
  invariant filtering is crucial for quality and scalability of
  strategies that use invariants; and (3) using traces in combination
  with invariants greatly improves robustness to input noise. We
  present our empirical evaluation, implement new and extend existing
  model-inference techniques, and make public our implementations,
  subject libraries, ground-truth models, and experimental data. Our
  work can lead to higher-quality model inference, and directly
  improve the name techniques and tools that rely on model,
  specification, and API inference.},

  fundedBy = {NSF CCF-1117593, NSF CCF-1218115, NSF CCF-1321141, 
  Infosys Technologies Ltd},
}