Repairing Programs with Semantic Code Search

Ke, Yalin; Stolee, Kathryn T.; Goues, Claire Le; Brun, Yuriy

doi:10.1109/ASE.2015.60

by Yalin Ke, Kathryn T. Stolee, Claire Le Goues, Yuriy Brun

Abstract:

Automated program repair can potentially reduce debugging costs and improve software quality but recent studies have drawn attention to shortcomings in the quality of automatically generated repairs. We propose a new kind of repair that uses the large body of existing open-source code to find potential fixes. The key challenges lie in efficiently finding code semantically similar (but not identical) to defective code and then appropriately integrating that code into a buggy program. We present SearchRepair, a repair technique that addresses these challenges by (1) encoding a large database of human-written code fragments as SMT constraints on input-output behavior, (2) localizing a given defect to likely buggy program fragments and deriving the desired input-output behavior for code to replace those fragments, (3) using state-of-the-art constraint solvers to search the database for fragments that satisfy that desired behavior and replacing the likely buggy code with these potential patches, and (4) validating that the patches repair the bug against program test suites. We find that SearchRepair repairs 150 (19%) of 778 benchmark C defects written by novice students, 20 of which are not repaired by GenProg, TrpAutoRepair, and AE. We compare the quality of the patches generated by the four techniques by measuring how many independent, not-used-during-repair tests they pass, and find that SearchRepair-repaired programs pass 97.3% of the tests, on average, whereas GenProg-, TrpAutoRepair-, and AE-repaired programs pass 68.7%, 72.1%, and 64.2% of the tests, respectively. We conclude that SearchRepair produces higher-quality repairs than GenProg, TrpAutoRepair, and AE, and repairs some defects those tools cannot.

View PDF

Citation:

Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun, Repairing Programs with Semantic Code Search, in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 295–306.

Bibtex:

@inproceedings{Ke15ase,
  author = {Yalin Ke and Kathryn T. Stolee and Claire {Le Goues} and Yuriy Brun},
  title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Ke15ase.pdf}{Repairing
  Programs with Semantic Code Search}},
  booktitle = {Proceedings of the 30th IEEE/ACM International Conference on
  Automated Software Engineering (ASE)}, 
	venue = {ASE},

  address = {Lincoln, NE, USA},
  month = {November},
  date = {9--13},
  year = {2015},
  doi = {10.1109/ASE.2015.60},  
  note = {\href{https://doi.org/10.1109/ASE.2015.60}{DOI: 10.1109/ASE.2015.60}},
  pages = {295--306},
  accept = {$\frac{55}{289} \approx 19\%$},

  abstract = {<p>Automated program repair can potentially reduce debugging
  costs and improve software quality but recent studies have drawn attention
  to shortcomings in the quality of automatically generated repairs. We
  propose a new kind of repair that uses the large body of existing
  open-source code to find potential fixes. The key challenges lie in
  efficiently finding code semantically similar (but not identical) to
  defective code and then appropriately integrating that code into a buggy
  program. We present SearchRepair, a repair technique that addresses these
  challenges by (1) encoding a large database of human-written code fragments
  as SMT constraints on input-output behavior, (2) localizing a given defect
  to likely buggy program fragments and deriving the desired input-output
  behavior for code to replace those fragments, (3) using state-of-the-art
  constraint solvers to search the database for fragments that satisfy that
  desired behavior and replacing the likely buggy code with these potential
  patches, and (4) validating that the patches repair the bug against program
  test suites. We find that SearchRepair repairs 150 (19%) of 778 benchmark C
  defects written by novice students, 20 of which are not repaired by
  GenProg, TrpAutoRepair, and AE. We compare the quality of the patches
  generated by the four techniques by measuring how many independent,
  not-used-during-repair tests they pass, and find that SearchRepair-repaired
  programs pass 97.3% of the tests, on average, whereas GenProg-,
  TrpAutoRepair-, and AE-repaired programs pass 68.7%, 72.1%, and 64.2% of
  the tests, respectively. We conclude that SearchRepair produces
  higher-quality repairs than GenProg, TrpAutoRepair, and AE, and repairs
  some defects those tools cannot. </p>}, 

  fundedBy = {NSF CCF-1446683, NSF CCF-1446932, NSF CCF-1446966, NSF CCF-1453474},
}