The Plastic Surgery Hypothesis
by Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, Federica Sarro
Abstract:
Recent work on genetic-programming-based approaches to automatic program patching have relied on the insight that the content of new code can often be assembled out of fragments of code that already exist in the code base. This insight has been dubbed the plastic surgery hypothesis; successful, well-known automatic repair tools such as GenProg rest on this hypothesis, but it has never been validated. We formalize and validate the plastic surgery hypothesis and empirically measure the extent to which raw material for changes actually already exists in projects. In this paper, we mount a large-scale study of several large Java projects, and examine a history of 15,723 commits to determine the extent to which these commits are graftable, i.e., can be reconstituted from existing code, and find an encouraging degree of graftability, surprisingly independent of commit size and type of commit. For example, we find that changes are 43% graftable from the exact version of the software being changed. With a view to investigating the difficulty of finding these grafts, we study the abundance of such grafts in three possible sources: the immediately previous version, prior history, and other projects. We also examine the contiguity or chunking of these grafts, and the degree to which grafts can be found in the same file. Our results are quite promising and suggest an optimistic future for automatic program patching methods that search for raw material in already extant code in the project being patched.
Citation:
Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro, The Plastic Surgery Hypothesis, in Proceedings of the 22nd ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), 2014, pp. 306–317.
Bibtex:
@inproceedings{Barr14fse,
  author = {Earl T. Barr and Yuriy Brun and Premkumar Devanbu and Mark Harman and Federica Sarro},
  title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Barr14fse.pdf}{The Plastic Surgery Hypothesis}},
  booktitle = {Proceedings of the 22nd ACM SIGSOFT Symposium on the
  Foundations of Software Engineering (FSE)},
  venue = {FSE},
  month = {November},
  year = {2014},
  date = {16--22},
  address = {Hong Kong, China},
  accept = {$\frac{61}{273} \approx 22\%$},
	pages = {306--317},

  note = {\href{http://dx.doi.org/10.1145/2635868.2635898}{DOI: 10.1145/2635868.2635898}},
  doi = {10.1145/2635868.2635898},

  abstract = {Recent work on genetic-programming-based approaches to automatic
  program patching have relied on the insight that the content of new
  code can often be assembled out of fragments of code that already
  exist in the code base. This insight has been dubbed the plastic
  surgery hypothesis; successful, well-known automatic repair tools such
  as GenProg rest on this hypothesis, but it has never been validated.
  We formalize and validate the plastic surgery hypothesis and
  empirically measure the extent to which raw material for changes
  actually already exists in projects. In this paper, we mount a
  large-scale study of several large Java projects, and examine a
  history of 15,723 commits to determine the extent to which these
  commits are graftable, i.e., can be reconstituted from existing code,
  and find an encouraging degree of graftability, surprisingly
  independent of commit size and type of commit. For example, we find
  that changes are 43% graftable from the exact version of the software
  being changed. With a view to investigating the difficulty of finding
  these grafts, we study the abundance of such grafts in three possible
  sources: the immediately previous version, prior history, and other
  projects. We also examine the contiguity or chunking of these grafts,
  and the degree to which grafts can be found in the same file. Our
  results are quite promising and suggest an optimistic future for
  automatic program patching methods that search for raw material in
  already extant code in the project being patched.},

  fundedBy = {EPSRC CREST Platform Grant EP/G060525, DAASE EP/J017515, 
  NSF CCF-1247280 and NSF CCF-1446683},
}