Finding latent code errors via machine learning over programs executions
by Yuriy Brun, Michael D. Ernst
Abstract:
This paper proposes a technique for identifying program properties that indicate errors. The technique generates machine learning models of program properties known to result from errors, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. Given a set of properties produced by the program analysis, the technique selects a subset of properties that are most likely to reveal an error. An implementation, the Fault Invariant Classifier, demonstrates the efficacy of the technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. In our experimental evaluation, the technique increases the relevance (the concentration of fault-revealing properties) by a factor of 50 on average for the C programs, and 4.8 for the Java programs. Preliminary experience suggests that most of the fault-revealing properties do lead a programmer to an error.
Citation:
Yuriy Brun and Michael D. Ernst, Finding latent code errors via machine learning over programs executions, in Proceedings of the 26th International Conference on Software Engineering (ICSE), 2004, pp. 480–490.
Bibtex:
@inproceedings{Brun04icse,
  author = {Yuriy Brun and Michael D. Ernst},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Brun04icse.pdf}{Finding latent
  code errors via machine learning over programs executions}},
  booktitle = {Proceedings of the 26th International Conference on Software
  Engineering (ICSE)},
  venue = {ICSE},
  address = {Edinburgh, Scotland},
  month = {May},
  date = {26--28},
  year = {2004},
  pages = {480--490},
  accept = {$\frac{58}{436} \approx 13\%$},
  doi = {10.1109/ICSE.2004.1317470},

  note = {\href{http://dx.doi.org/10.1109/ICSE.2004.1317470}{DOI:
  10.1109/ICSE.2004.1317470}},

  abstract = {This paper proposes a technique for identifying program properties
  that indicate errors. The technique generates machine learning models of
  program properties known to result from errors, and applies these models to
  program properties of user-written code to classify and rank properties that
  may lead the user to errors. Given a set of properties produced by the program
  analysis, the technique selects a subset of properties that are most likely to
  reveal an error. An implementation, the Fault Invariant Classifier,
  demonstrates the efficacy of the technique. The implementation uses dynamic
  invariant detection to generate program properties. It uses support vector
  machine and decision tree learning tools to classify those properties. In our
  experimental evaluation, the technique increases the relevance (the
  concentration of fault-revealing properties) by a factor of 50 on average for
  the C programs, and 4.8 for the Java programs. Preliminary experience suggests
  that most of the fault-revealing properties do lead a programmer to an
  error.},
}