Software fault identification via dynamic analysis and machine learning
by Yuriy Brun
Abstract:
I propose a technique that identifies program properties that may indicate errors. The technique generates machine learning models of run-time program properties known to expose faults, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. I evaluate an implementation of the technique, the Fault Invariant Classifier, that demonstrates the efficacy of the error finding technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. Given a set of properties produced by the program analysis, some of which are indicative of errors, the technique selects a subset of properties that are most likely to reveal an error. The experimental evaluation over 941,000 lines of code, showed that a user must examine only the 2.2 highest-ranked properties for C programs and 1.7 for Java programs to find a fault-revealing property. The technique increases the relevance (the concentration of properties that reveal errors) by a factor of 50 on average for C programs, and 4.8 for Java programs.
Citation:
Yuriy Brun, Software fault identification via dynamic analysis and machine learning, Master's thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2003.
Bibtex:
@Mastersthesis{Brun03masters,
  author = {Yuriy Brun},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Brun03masters.pdf}{Software
  fault identification via dynamic analysis and machine learning}},
  venue = {M.Eng.},
  year = {2003},
  department = {Department of Electrical Engineering and Computer Science},
  school = {Massachusetts Institute of Technology},
  address = {Cambridge, {MA}, {USA}},
  month = {August},
  date = {16},
  url = {http://hdl.handle.net/1721.1/17939},

  note = {\href{http://hdl.handle.net/1721.1/17939}{URL:
  http://hdl.handle.net/1721.1/17939}},

  abstract = {I propose a technique that identifies program properties that may
  indicate errors. The technique generates machine learning models of run-time
  program properties known to expose faults, and applies these models to program
  properties of user-written code to classify and rank properties that may lead
  the user to errors. I evaluate an implementation of the technique, the Fault
  Invariant Classifier, that demonstrates the efficacy of the error finding
  technique. The implementation uses dynamic invariant detection to generate
  program properties. It uses support vector machine and decision tree learning
  tools to classify those properties. Given a set of properties produced by the
  program analysis, some of which are indicative of errors, the technique
  selects a subset of properties that are most likely to reveal an error. The
  experimental evaluation over 941,000 lines of code, showed that a user must
  examine only the 2.2 highest-ranked properties for C programs and 1.7 for Java
  programs to find a fault-revealing property. The technique increases the
  relevance (the concentration of properties that reveal errors) by a factor of
  50 on average for C programs, and 4.8 for Java programs.},
}