Automated program repair is already deployed in industry, but concerns remain about repair quality. Recent research has shown that one of the main reasons repair tools produce incorrect (but seemingly correct) patches is imperfect fault localization (FL). This paper demonstrates that combining information from natural-language bug reports and test executions when localizing bugs can have a significant positive impact on repair quality. By modifying existing repair tools to use FL that combines bug reports and tests, we are able to correctly repair 7 defects in Defects4J that no prior tools have repaired correctly.
We develop, Blues, the first information-retrieval-based, statement-level FL technique that requires no training data. We further develop RAFL, the first unsupervised method for combining multiple FL techniques, which outperforms a supervised method. Using RAFL, we create SBIR by combining Blues with a spectrum-based (SBFL) technique. Evaluated on 815 real-world defects, SBIR consistently ranks buggy statements higher than its underlying techniques.
Finally, we modify three state-of-the-art repair tools, Arja, SequenceR, and SimFix, to use SBIR, SBFL, and Blues as their internal FL. We evaluate the quality of the produced patches on 689 real-world defects. Arja and SequenceR significantly benefit from SBIR: Arja using SBIR correctly repairs 28 defects, but only 21 using SBFL, and only 15 using Blues; SequenceR using SBIR correctly repairs 12 defects, but only 10 using SBFL, and only 4 using Blues. SimFix, (which has internal mechanisms to overcome poor FL), correctly repairs 30 defects using SBIR and SBFL, but only 13 using Blues. Our promising findings direct further research into combining data from bug reports and test executions for FL and program repair.
@inproceedings{Motwani23icse, author = {Manish Motwani and Yuriy Brun}, title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Motwani23icse.pdf}{Better Automatic Program Repair by Using Bug Reports and Tests Together}}, booktitle = {Proceedings of the 45th International Conference on Software Engineering (ICSE)}, venue = {ICSE}, address = {Melbourne, Australia}, month = {May}, pages = {1229--1241}, date = {14--20}, year = {2023}, note = {ACM artifact badges granted: \href{https://www.acm.org/publications/policies/artifact-review-and-badging-current}{\raisebox{-.75ex}{\includegraphics[height=2.5ex]{ACMArtifactAvailable}}~Artifact Available, \raisebox{-.75ex}{\includegraphics[height=2.5ex]{ACMArtifactReusable}}~Artifact Reusable}. \href{https://doi.org/10.1109/ICSE48619.2023.00109}{DOI: 10.1109/ICSE48619.2023.00109}}, doi = {10.1109/ICSE48619.2023.00109}, accept = {$\frac{207}{796} \approx 26\%$}, abstract = {<p>Automated program repair is already deployed in industry, but concerns remain about repair quality. Recent research has shown that one of the main reasons repair tools produce incorrect (but seemingly correct) patches is imperfect fault localization (FL). This paper demonstrates that combining information from natural-language bug reports and test executions when localizing bugs can have a significant positive impact on repair quality. By modifying existing repair tools to use FL that combines bug reports and tests, we are able to correctly repair 7 defects in Defects4J that no prior tools have repaired correctly.</p> <p>We develop, Blues, the first information-retrieval-based, statement-level FL technique that requires no training data. We further develop RAFL, the first unsupervised method for combining multiple FL techniques, which outperforms a supervised method. Using RAFL, we create SBIR by combining Blues with a spectrum-based (SBFL) technique. Evaluated on 815 real-world defects, SBIR consistently ranks buggy statements higher than its underlying techniques.</p> <p>Finally, we modify three state-of-the-art repair tools, Arja, SequenceR, and SimFix, to use SBIR, SBFL, and Blues as their internal FL. We evaluate the quality of the produced patches on 689 real-world defects. Arja and SequenceR significantly benefit from SBIR: Arja using SBIR correctly repairs 28 defects, but only 21 using SBFL, and only 15 using Blues; SequenceR using SBIR correctly repairs 12 defects, but only 10 using SBFL, and only 4 using Blues. SimFix, (which has internal mechanisms to overcome poor FL), correctly repairs 30 defects using SBIR and SBFL, but only 13 using Blues. Our promising findings direct further research into combining data from bug reports and test executions for FL and program repair.</p>}, fundedBy = {NSF CCF-1763423, NSF CCF-2210243}, }