Do Automated Program Repair Techniques Repair Hard and Important Bugs?
by Manish Motwani, Sandhya Sankaranarayanan, René Just, Yuriy Brun
Abstract:
Existing evaluations of automated repair techniques focus on the fraction of the defects for which the technique can produce a patch, the time needed to produce patches, and how well patches generalize to the intended specification. However, these evaluations have not focused on the applicability of repair techniques and the characteristics of the defects that these techniques can repair. Questions such as "Can automated repair techniques repair defects that are hard for developers to repair?" and "Are automated repair techniques less likely to repair defects that involve loops?" have not, as of yet, been answered. To address such questions, we annotate two large benchmarks totaling 409 C and Java defects in real-world software, ranging from 22K to 2.8M lines of code, with measures of the defect's importance, the developer-written patch's complexity, and the quality of the test suite. We then analyze relationships between these measures and the ability to produce patches for the defects of seven automated repair techniques -- AE, GenProg, Kali, Nopol, Prophet, SPR, and TrpAutoRepair. We find that automated repair techniques are less likely to produce patches for defects that required developers to write a lot of code or edit many files, or that have many tests relevant to the defect. Java techniques are more likely to produce patches for high-priority defects. Neither the time it took developers to fix a defect nor the test suite's coverage correlate with the automated repair techniques' ability to produce patches. Finally, automated repair techniques are less capable of fixing defects that require developers to add loops and new function calls, or to change method signatures. These findings identify strengths and shortcomings of the state-of-the-art of automated program repair along new dimensions. The presented methodology can drive research toward improving the applicability of automated repair techniques to hard and important bugs.
Citation:
Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun, Do Automated Program Repair Techniques Repair Hard and Important Bugs?, Empirical Software Engineering (EMSE), vol. 23, no. 5, October 2018, pp. 2901–2947.
Bibtex:
@article{Motwani18emse,
  author = {Manish Motwani and Sandhya Sankaranarayanan and Ren{\'{e}} Just and Yuriy Brun},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Motwani18emse.pdf}{Do Automated Program Repair Techniques Repair Hard and Important Bugs?}},
  journal = {Empirical Software Engineering (EMSE)},
  venue = {EMSE},
  year = {2018},
  volume = {23},
  number = {5},
  month = {October},
  pages = {2901--2947},
  issn = {1382-3256},

  doi = {10.1007/s10664-017-9550-0},
  note = {\href{https://doi.org/10.1007/s10664-017-9550-0}{DOI:
  10.1007/s10664-017-9550-0}},

  abstract = {Existing evaluations of automated repair techniques focus on the fraction of
  the defects for which the technique can produce a patch, the time needed to
  produce patches, and how well patches generalize to the intended
  specification. However, these evaluations have not focused on the
  applicability of repair techniques and the characteristics of the defects
  that these techniques can repair. Questions such as "Can automated repair
  techniques repair defects that are hard for developers to repair?" and "Are
  automated repair techniques less likely to repair defects that involve
  loops?" have not, as of yet, been answered. To address such questions, we
  annotate two large benchmarks totaling 409 C and Java defects in real-world
  software, ranging from 22K to 2.8M lines of code, with measures of the
  defect's importance, the developer-written patch's complexity, and the
  quality of the test suite. We then analyze relationships between these
  measures and the ability to produce patches for the defects of seven
  automated repair techniques -- AE, GenProg, Kali, Nopol, Prophet, SPR, and
  TrpAutoRepair. We find that automated repair techniques are less likely to
  produce patches for defects that required developers to write a lot of code
  or edit many files, or that have many tests relevant to the defect. Java
  techniques are more likely to produce patches for high-priority defects.
  Neither the time it took developers to fix a defect nor the test suite's
  coverage correlate with the automated repair techniques' ability to produce
  patches. Finally, automated repair techniques are less capable of fixing
  defects that require developers to add loops and new function calls, or to
  change method signatures. These findings identify strengths and shortcomings
  of the state-of-the-art of automated program repair along new dimensions. The
  presented methodology can drive research toward improving the applicability
  of automated repair techniques to hard and important bugs.},

  fundedBy = {NSF CCF-1453474, NSF CCF-1564162}, 
}