A case for double-blind reviewing in software engineering

April 2015

This letter is to urge organizers of software engineering conferences to implement double-blind paper reviewing. I urge the organizers to do so for three reasons: (1) the scientific evidence that there exists bias in reviewing, and that double-blind reviewing reduces such bias, is overwhelming, and this is particularly important in our diversity-challenged field, (2) software engineering is falling behind the top conferences in other areas of computer science on this important issue, and (3) while the potential benefits of double-blind reviewing are significant, the costs and risks are minor.

First, there is a large, and growing body of evidence that subconscious biases influence one’s ability to objectively evaluate work. This evidence points to bias against national origin, gender, sexual orientation, and race. The evidence further supports the claim that double-blind reviewing mitigates these bias effects and thus improves the quality of the review process. A sampling of the relevant studies includes:

In a medical discipline, US reviewers were more likely than non-US reviewers to recommend acceptance of papers submitted from US-based institutions than those submitted from non-US-based institutions. Critically, removing the institutional information from the submissions mitigates the effect [1].
Faculty at research-focused US universities were less likely to hire female candidates as lab managers than male candidates with the identical qualifications and identical resumes, and offered the male candidate a ~10% higher starting salary, on average. Faculty, precisely the group that dominates most research track program committees, viewed the female candidates as less competent than the male candidates [2].
“In 2001, double-blind review was introduced by the journal Behavioral Ecology. Following this policy change, there was a significant increase in female first-authored papers, a pattern not observed in a very similar journal that provides reviewers with author information. No negative effects could be identified, suggesting that double-blind review should be considered by other journals.” [3]
A recent study found that online students give significantly lower evaluation scores when they think the instructor is a woman vs. a man [7]. A similar study showed similar bias for the instructor’s sexual orientation (i.e., when the only difference in the presented lecture is the instructor referring to “my partner Jason” vs. “my partner Jennifer” [8]).

(Claire Le Goues helped collect the research included in this article [9].)

Importantly, the biases that create these effects are subconscious. The vast majority of academics are well-intentioned and there exist very, very few explicitly racist or sexist reviewers. Still, the evidence shows that these biases affect everyone, regardless of the race and gender of the evaluator, in large part because of the societal gender, race, and national origin norms present in our upbringing and cultural surroundings. While to the best of my knowledge, no studies have demonstrated such biases specifically at software engineering venues and community, evidence suggests that the biases are broad and affect many academic domains, and there is no reason to believe, despite our past efforts to be inclusive and fair, that our community is immune to biases that affect so many others. Luckily, existing evidence suggests that these bias effects can be mitigated in peer review through the use of double-blind review. This is a constructive step toward providing a review system that objectively evaluates submitted papers based strictly on the quality of the described work.

Second, software engineering is behind on adopting double-blind reviewing. Many premier conferences in other areas of computer science are double blind, including PLDI, POPL, CHI, SIGMOD, OSDI, IEEE Security and Privacy (Oakland), SIGCOMM, SIGIR, etc. (Admittedly, several premier conferences are still single-blind, including STOC, FOCS, SIGGRAPH, VLDB, and NSDI.) The fact that these conferences have adopted double-blind reviewing makes our job easier. For example, a recent instance of PLDI (2013) has used CyberChairPRO with double-blind reviewing, working out many of its kinks, and greatly reducing our cost to entry. Additionally, many of these conferences have FAQs on the topic we could borrow from if necessary, e.g., see

http://www.cs.princeton.edu/~dpw/popl/15/dbr-faq.html#studies.

Third, the cost of switching to double-blind reviewing is low. As previously mentioned, CyberChairPRO already supports it, including entering conflicts of interest. My investigation and personal conversations have uncovered three other common complaints to double-blind reviewing:

“What if I have to refer to my own prior work?” It is hard for me to imagine that more than 10 minutes are required to replace all instances of “We have previously [77]...” with “Brun et al. have previously [77]...” The “it’s annoying” argument simply can’t deter us. Even just the possibility of bias in paper acceptances is enough to ask the authors to exert this little extra effort.
“A paper may be rejected because reviewers are aware of closely related work that, in fact, turns out to be by the same authors.” This is possible, although I argue that a paper should be a significant contribution beyond prior work to stand on its own and should properly explain the difference with prior work regardless of whether that work is by the same or by other authors. Nevertheless, other conferences (e.g., PLDI) has used a variant of double-blind reviewing in which once the review is submitted, the authors are revealed. I am personally unconvinced this is necessary, but I do think this is an acceptable compromise. Again, this is supported by CyberChairPRO.
“I can infer the identities of authors and therefore blinding submissions is merely inconvenient to authors with no obvious benefit.” While many people believe they can unblind submissions, experimental evidence suggests people are less reliable at this task than they think. For example, in the field of public health, reviewers claimed to be able to identify authors 47% of the time, but were wrong 16% of the time [4] (thus reviewers were successful in unblinding 31% of the time, and failed to do so 69% of the time). Other studies of double-blind with unblinding once a review is submitted show that author identities remain unknown 53% to 79% of the time [6]. Moreover, in the worst case scenario that reviewers are able to unblind some papers, the result is the status quo of (partially) single-blind reviewing. There is no risk of the situation becoming worse than what we would do if we do not adopt double-blind reviewing. As an additional point, studies have found no measurable impact on the quality of the reviews from blinding as judged by both reviewers and reviewees (e.g., [5]), so, again, in the worst case, introducing double-blind reviewing will be no worse than the status quo.

Given the significant effect non-double-blind reviewing has on fairness in reviewing and accepting papers, and the critical impact paper acceptances in premier software engineering venues have on academic careers and advancement, I strongly believe that it is unethical for us to continue using single-blind reviewing. While the costs of implementing double-blind reviewing are non-zero, they have been mitigated in many ways by other conferences leading the way, and by CyberChairPRO’s support. The small effort the authors have to exert to make their papers suitable to double-blind reviewing, frankly, cannot justify allowing the risk of bias in paper selection.

I strongly urge you to consider changing your conference’s research track to use the double-blind review process.

While I have attempted to compile a convincing argument here, others have tackled this subject previously, and you may find their additional support of double-blind reviewing helpful:

Kathryn McKinley: http://www.cs.utexas.edu/users/mckinley/notes/blind.html
Richard Snodgrass: http://tods.acm.org/editorials/doubleblind.pdf
Claire Le Goues: https://www.cs.cmu.edu/~clegoues/double-blind.html

Yuriy Brun

University of Massachusetts, Amherst,

with support from:

Andrew Begel, Microsoft Research

Ivan Beschastnikh, University of British Columbia

Jeff Huang, Texas A&M University

Miryung Kim, University of California, Los Angeles

Claire Le Goues, Carnegie Mellon University

Wei Le, Iowa State University

Emerson Murphy-Hill, North Carolina State University

Mei Nagappan, Rochester Institute of Technology

Kathryn Stolee, Iowa State University

Emina Torlak, University of Washington

References:

[1] Gastroenterology, Bethesda, MD. US and non-US submissions: an analysis of reviewer bias. JAMA, Jul 15; 280(3):246-7, 1998,
http://www.ncbi.nlm.nih.gov/pubmed/9676670

[2] Moss-Racusin, C.A, Dovidio, J.F., Brescoll, V.L, Graham, M. J., and Handelsman, J. Science faculty’s subtle gender biases favor male students. Proceedings of the National Academy of Sciences, 109, 2012, 16474–16479,
http://www.pnas.org/content/109/41/16474.full.pdf

[3] Budden, T., Aarssen, K, and Leimu, L. Double-blind review favours increased representation of female authors. Trends Ecol. Evol.23(1), 2008, 4–6,
http://www.ncbi.nlm.nih.gov/pubmed/17963996

[4] Yankauer, A. How blind is blind review. Am. J. Public Health 81, 1991, 843–845,
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1405201

[5] van Rooyen, S., Godlee, F., Evans, S., Smith, R., and Black, N. Effect of blinding and unmasking on the quality of peer review – A randomized trial. JAMA 280, 1998, 234–237,
http://www.ncbi.nlm.nih.gov/pubmed/9676666

[6] Snodgrass, R. Single- versus double-blind reviewing: An analysis of the literature, SIGMOD Record, Vol. 35, No. 3, 2006,
http://tods.acm.org/editorials/doubleblind.pdf

[7] MacNell, L., Driscoll, A., and Hunt., A.N. What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching. Innovative Higher Education, 2014, http://dx.doi.org/10.1007/s10755-014-9313-4

[8] Russ, T., Simonds, C., and Hunt, S. Coming Out in the Classroom … An Occupational Hazard?: The Influence of Sexual Orientation on Teacher Credibility and Perceived Student Learning. Communication Education 51(3), 2002, 311–324.
http://dx.doi.org/10.1080/03634520216516

[9] Le Goues, C., SSBSE with Double Blind
https://www.cs.cmu.edu/~clegoues/double-blind.html

Update: June 2015

The ACM/IEEE International Conference on Software Engineering (ICSE) has formed a taskforce to investigate the potential costs of switching to using double-blind reviewing.

Update: July 2015

The ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) steering committee has decided that starting with ISSTA 2016, the research track will employ double-blind reviewing. The commitment is for 3 years.

The IEEE/ACM International Conference on Automated Software Engineering (ASE) steering committee is seriously considering employing double-blind reviewing. The ASE 2016 general and PC chairs are strongly supportive of this move.

Update: August 2015

The International Conference on Fundamental Approaches to Software Engineering (FASE) has implemented lightweight double-blind reviewing for FASE 2016.

Update: June 2016

The International Conference on Software Engineering (ICSE) will implement full double-blind reviewing for ICSE 2018.

Update: September 2017

All major software engineering conferences (ICSE, ESEC/FSE, ASE, ISSTA, ICSA, ICST, etc.) have implemented double-blind reviewing!