Better Automatic Program Repair by Using Bug Reports and Tests Together
by Manish Motwani, Yuriy Brun
Abstract:

Automated program repair is already deployed in industry, but concerns remain about repair quality. Recent research has shown that one of the main reasons repair tools produce incorrect (but seemingly correct) patches is imperfect fault localization (FL). This paper demonstrates that combining information from natural-language bug reports and test executions when localizing bugs can have a significant positive impact on repair quality. By modifying existing repair tools to use FL that combines bug reports and tests, we are able to correctly repair 7 defects in Defects4J that no prior tools have repaired correctly.

We develop, Blues, the first information-retrieval-based, statement-level FL technique that requires no training data. We further develop RAFL, the first unsupervised method for combining multiple FL techniques, which outperforms a supervised method. Using RAFL, we create SBIR by combining Blues with a spectrum-based (SBFL) technique. Evaluated on 815 real-world defects, SBIR consistently ranks buggy statements higher than its underlying techniques.

Finally, we modify three state-of-the-art repair tools, Arja, SequenceR, and SimFix, to use SBIR, SBFL, and Blues as their internal FL. We evaluate the quality of the produced patches on 689 real-world defects. Arja and SequenceR significantly benefit from SBIR: Arja using SBIR correctly repairs 28 defects, but only 21 using SBFL, and only 15 using Blues; SequenceR using SBIR correctly repairs 12 defects, but only 10 using SBFL, and only 4 using Blues. SimFix, (which has internal mechanisms to overcome poor FL), correctly repairs 30 defects using SBIR and SBFL, but only 13 using Blues. Our promising findings direct further research into combining data from bug reports and test executions for FL and program repair.

Citation:
Manish Motwani and Yuriy Brun, Better Automatic Program Repair by Using Bug Reports and Tests Together, in Proceedings of the 45th International Conference on Software Engineering (ICSE), 2023, pp. 1229–1241.
Bibtex:
@inproceedings{Motwani23icse,
  author = {Manish Motwani and Yuriy Brun},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Motwani23icse.pdf}{Better Automatic Program Repair by Using Bug Reports and Tests Together}},
  booktitle = {Proceedings of the 45th International Conference on Software Engineering (ICSE)},
  venue = {ICSE},
  address = {Melbourne, Australia},
  month = {May},
  pages = {1229--1241},
  date = {14--20},
  year = {2023},

  note = {ACM artifact badges granted: 
  \href{https://www.acm.org/publications/policies/artifact-review-and-badging-current}{\raisebox{-.75ex}{\includegraphics[height=2.5ex]{ACMArtifactAvailable}}~Artifact Available, 
  \raisebox{-.75ex}{\includegraphics[height=2.5ex]{ACMArtifactReusable}}~Artifact Reusable}.
  \href{https://doi.org/10.1109/ICSE48619.2023.00109}{DOI: 10.1109/ICSE48619.2023.00109}}, 
  doi = {10.1109/ICSE48619.2023.00109},

  accept = {$\frac{207}{796} \approx 26\%$},

  abstract = {<p>Automated program repair is already deployed in industry,
  but concerns remain about repair quality. Recent research has shown that
  one of the main reasons repair tools produce incorrect (but seemingly
  correct) patches is imperfect fault localization (FL). This paper
  demonstrates that combining information from natural-language bug reports
  and test executions when localizing bugs can have a significant positive
  impact on repair quality. By modifying existing repair tools to use FL that
  combines bug reports and tests, we are able to correctly repair 7 defects
  in Defects4J that no prior tools have repaired correctly.</p>

  <p>We develop, Blues, the first information-retrieval-based,
  statement-level FL technique that requires no training data. We further
  develop RAFL, the first unsupervised method for combining multiple FL
  techniques, which outperforms a supervised method. Using RAFL, we create
  SBIR by combining Blues with a spectrum-based (SBFL) technique. Evaluated
  on 815 real-world defects, SBIR consistently ranks buggy statements higher
  than its underlying techniques.</p>

  <p>Finally, we modify three state-of-the-art repair tools, Arja, SequenceR,
  and SimFix, to use SBIR, SBFL, and Blues as their internal FL. We evaluate
  the quality of the produced patches on 689 real-world defects. Arja and
  SequenceR significantly benefit from SBIR: Arja using SBIR correctly
  repairs 28 defects, but only 21 using SBFL, and only 15 using Blues;
  SequenceR using SBIR correctly repairs 12 defects, but only 10 using SBFL,
  and only 4 using Blues. SimFix, (which has internal mechanisms to overcome
  poor FL), correctly repairs 30 defects using SBIR and SBFL, but only 13
  using Blues. Our promising findings direct further research into combining
  data from bug reports and test executions for FL and program repair.</p>},

  fundedBy = {NSF CCF-1763423, NSF CCF-2210243},
}