Fairness Testing: Testing Software for Discrimination
by Sainyam Galhotra, Yuriy Brun, Alexandra Meliou
Abstract:

This paper defines the notions of software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates. Specifically, the paper focuses on measuring causality in discriminatory behavior. Modern software contributes to important societal decisions and evidence of software discrimination has been found in systems that recommend criminal sentences, grant access to financial loans and products, and determine who is allowed to participate in promotions and receive services. Our approach, Themis, measures discrimination in software by generating efficient, discrimination-testing test suites. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and, notably, does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.

Citation:
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou, Fairness Testing: Testing Software for Discrimination, in Proceedings of the 11th Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), 2017, pp. 498–510 (ACM SIGSOFT Distinguished Paper Award).
Bibtex:
@inproceedings{Galhotra17fse,
  author = {Sainyam Galhotra and Yuriy Brun and Alexandra Meliou},
  title = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Galhotra17fse.pdf}{Fairness Testing: Testing Software for Discrimination}},
  booktitle = {Proceedings of the 11th Joint Meeting of the European
  Software Engineering Conference and ACM SIGSOFT Symposium on the
  Foundations of Software Engineering (ESEC/FSE)},
  venue = {ESEC/FSE},
  month = {September},
  year = {2017},
  date = {6--8},
  address = {Paderborn, Germany},
  pages = {498--510},

  doi = {10.1145/3106237.3106277},
  note = {\raisebox{-.5ex}{\includegraphics[height=2.5ex]{trophy}}~ACM SIGSOFT Distinguished Paper Award.
  \href{https://doi.org/10.1145/3106237.3106277}{DOI: 10.1145/3106237.3106277}, 
  arXiv: \href{https://arxiv.org/abs/1709.03221}{abs/1709.03221}},
  comment = {<span class="emphasis">ACM SIGSOFT Distinguished Paper Award</span>},

  accept = {$\frac{72}{295} \approx 24\%$},

  abstract = {<p>This paper defines the notions of software fairness and
  discrimination and develops a testing-based method for measuring if and how
  much software discriminates. Specifically, the paper focuses on measuring
  causality in discriminatory behavior. Modern software contributes to
  important societal decisions and evidence of software discrimination has
  been found in systems that recommend criminal sentences, grant access to
  financial loans and products, and determine who is allowed to participate
  in promotions and receive services. Our approach, Themis, measures
  discrimination in software by generating efficient, discrimination-testing
  test suites. Given a schema describing valid system inputs, Themis
  generates discrimination tests automatically and, notably, does not require
  an oracle. We evaluate Themis on 20 software systems, 12 of which come from
  prior work with explicit focus on avoiding discrimination. We find that
  (1) Themis is effective at discovering software discrimination,
  (2) state-of-the-art techniques for removing discrimination from algorithms
  fail in many situations, at times discriminating against as much as 98% of
  an input subdomain, (3) Themis optimizations are effective at producing
  efficient test suites for measuring discrimination, and (4) Themis is more
  efficient on systems that exhibit more discrimination. We thus demonstrate
  that fairness testing is a critical aspect of the software development
  cycle in domains with possible discrimination and provide initial tools for
  measuring software discrimination.</p>},

  fundedBy = {NSF CCF-1453474, NSF IIS-1453543, NSF CNS-1744471},
}