Understanding Why and Predicting When Developers Adhere to Code-Quality Standards
by Manish Motwani, Yuriy Brun
Abstract:

Static analysis tools are widely used in software development. While research has focused on improving tool accuracy, evidence at Microsoft suggests that developers often consider some accurately detected warnings not worth fixing: what these tools and developers consider to be true positives differs. Thus, improving tool utilization requires understanding when and why developers fix static-analysis warnings.

We conduct a case study of Microsoft's Banned API Standard used within the company, which describes 195 APIs that can potentially cause vulnerabilities and 142 recommended replacements. We find that developers often (84% of the time) consciously deviate from this standard, specifying their rationale, allowing us to study why and when developers deviate from standards. We then identify 23 factors that correlate with developers using the preferred APIs and build a model that predicts whether the developers would use the preferred or discouraged APIs under different circumstances with 92% accuracy. We also train a model to predict the kind of APIs developers would use in the future based on their past development activity, with 86% accuracy. We outline a series of concrete suggestions static analysis developers can use to prioritize and customize their output, potentially increasing their tools' usefulness.

Citation:
Manish Motwani and Yuriy Brun, Understanding Why and Predicting When Developers Adhere to Code-Quality Standards, in Proceedings of the Software Engineering in Practice Track at the 45th International Conference on Software Engineering (ICSE SEIP), 2023, pp. 432–444.
Bibtex:
@inproceedings{Motwani23icse-seip,
  author = {Manish Motwani and Yuriy Brun},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Motwani23icse-seip.pdf}{Understanding Why and Predicting When Developers Adhere to Code-Quality Standards}},
  booktitle = {Proceedings of the Software Engineering in Practice Track at the 45th International Conference on Software Engineering (ICSE SEIP)},
  venue = {ICSE SEIP},
  address = {Melbourne, Australia},
  month = {May},
  date = {14--20},
  pages = {432--444},
  year = {2023},
  accept = {$\frac{41}{146} \approx 28\%$},

  note = {\href{https://doi.org/10.1109/ICSE-SEIP58684.2023.00045}{DOI: 10.1109/ICSE-SEIP58684.2023.00045}, 
  arXiv: \href{https://arxiv.org/abs/2011.08340}{abs/2011.08340}.}, 
  doi = {10.1109/ICSE-SEIP58684.2023.00045},
  
  abstract = {<p>Static analysis tools are widely used in software development. While research
has focused on improving tool accuracy, evidence at Microsoft suggests that
developers often consider some accurately detected warnings not worth fixing:
what these tools and developers consider to be true positives differs. Thus,
improving tool utilization requires understanding when and why developers fix
static-analysis warnings.</p>

<p>We conduct a case study of Microsoft's Banned API Standard used within the
company, which describes 195 APIs that can potentially cause vulnerabilities
and 142 recommended replacements. We find that developers often (84% of the
time) consciously deviate from this standard, specifying their rationale,
allowing us to study why and when developers deviate from standards. We then
identify 23~factors that correlate with developers using the preferred APIs
and build a model that predicts whether the developers would use the
preferred or discouraged APIs under different circumstances with
92% accuracy. We also train a model to predict the kind of APIs developers
would use in the future based on their past development activity, with
86% accuracy. We outline a series of concrete suggestions static analysis
developers can use to prioritize and customize their output, potentially
increasing their tools' usefulness.</p>},

  fundedBy = {NSF CCF-1763423},
}