Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced automated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help support activities such as regression testing. Very few existing techniques support the generation of such tests, as doing so requires automating the difficult task of understanding the semantics of UI screens and user inputs. In this paper, we introduce Avgust, which automates key steps of generating usage-based tests. Avgust uses neural models for image understanding to process video recordings of app uses to synthesize an app-agnostic state-machine encoding of those uses. Then, Avgust uses this encoding to synthesize test cases for a new target app. We evaluate Avgust on 374 videos of common uses of 18 popular apps and show that 69% of the tests Avgust generates successfully execute the desired usage, and that Avgust's classifiers outperform the state of the art.
@inproceedings{Zhao22fse, author = {Yixue Zhao and Saghar Talebipour and Kesina Baral and Hyojae Park and Leon Yee and Safwat Ali Khan and Yuriy Brun and Nenad Medvidovic and Kevin Moran}, title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Zhao22fse.pdf}{{\textsc{Avgust}}: {Automating} Usage-Based Test Generation from Videos of App Executions}}, booktitle = {Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)}, venue = {ESEC/FSE}, month = {November}, year = {2022}, date = {14--18}, pages = {421--433}, address = {Singapore}, doi = {10.1145/3540250.3549134}, note = {ACM artifact badges granted: \href{https://www.acm.org/publications/policies/artifact-review-and-badging-current}{\raisebox{-.75ex}{\includegraphics[height=2.5ex]{ACMArtifactAvailable}}~Artifact Available, \raisebox{-.75ex}{\includegraphics[height=2.5ex]{ACMArtifactFunctional}}~Artifact Functional}. \href{https://doi.org/10.1145/3540250.3549134}{DOI: 10.1145/3540250.3549134}, arXiv: \href{https://arxiv.org/abs/2209.02577}{abs/2209.02577}}, accept = {$\frac{99}{449} \approx 22\%$}, abstract = {<p>Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced automated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help support activities such as regression testing. Very few existing techniques support the generation of such tests, as doing so requires automating the difficult task of understanding the semantics of UI screens and user inputs. In this paper, we introduce Avgust, which automates key steps of generating usage-based tests. Avgust uses neural models for image understanding to process video recordings of app uses to synthesize an app-agnostic state-machine encoding of those uses. Then, Avgust uses this encoding to synthesize test cases for a new target app. We evaluate Avgust on 374 videos of common uses of 18 popular apps and show that 69% of the tests Avgust generates successfully execute the desired usage, and that Avgust's classifiers outperform the state of the art.</p>}, fundedBy = {NSF CCF-1717963, NSF CCF-1763423, NSF CNS-1823354, NSF CCF1955853, NSF CCF-2030859 (to the CRA for CIFellows), U.S. Office of Naval Research N00014-17-1-2896}, }