by Yuriy Brun
Abstract:
I propose a technique that identifies program properties that may indicate errors. The technique generates machine learning models of run-time program properties known to expose faults, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. I evaluate an implementation of the technique, the Fault Invariant Classifier, that demonstrates the efficacy of the error finding technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. Given a set of properties produced by the program analysis, some of which are indicative of errors, the technique selects a subset of properties that are most likely to reveal an error. The experimental evaluation over 941,000 lines of code, showed that a user must examine only the 2.2 highest-ranked properties for C programs and 1.7 for Java programs to find a fault-revealing property. The technique increases the relevance (the concentration of properties that reveal errors) by a factor of 50 on average for C programs, and 4.8 for Java programs.
Citation:
Yuriy Brun, Software fault identification via dynamic analysis and machine learning, Master's thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2003.
Bibtex:
@Mastersthesis{Brun03masters,
author = {Yuriy Brun},
title =
{\href{http://people.cs.umass.edu/brun/pubs/pubs/Brun03masters.pdf}{Software
fault identification via dynamic analysis and machine learning}},
venue = {M.Eng.},
year = {2003},
department = {Department of Electrical Engineering and Computer Science},
school = {Massachusetts Institute of Technology},
address = {Cambridge, {MA}, {USA}},
month = {August},
date = {16},
url = {http://hdl.handle.net/1721.1/17939},
note = {\href{http://hdl.handle.net/1721.1/17939}{\newline URL:
http://hdl.handle.net/1721.1/17939}},
abstract = {I propose a technique that identifies program properties that may
indicate errors. The technique generates machine learning models of run-time
program properties known to expose faults, and applies these models to program
properties of user-written code to classify and rank properties that may lead
the user to errors. I evaluate an implementation of the technique, the Fault
Invariant Classifier, that demonstrates the efficacy of the error finding
technique. The implementation uses dynamic invariant detection to generate
program properties. It uses support vector machine and decision tree learning
tools to classify those properties. Given a set of properties produced by the
program analysis, some of which are indicative of errors, the technique
selects a subset of properties that are most likely to reveal an error. The
experimental evaluation over 941,000 lines of code, showed that a user must
examine only the 2.2 highest-ranked properties for C programs and 1.7 for Java
programs to find a fault-revealing property. The technique increases the
relevance (the concentration of properties that reveal errors) by a factor of
50 on average for C programs, and 4.8 for Java programs.},
}