UMass Machine Learning and Friends Lunch | Main / Social Data Biases Methodological Pitfalls And Social Good Applications

Bio: Alexandra Olteanu is a computational social science and social computing researcher. Currently, she is a Postdoctoral Researcher in the Fairness, Accountability, Transparency and Ethics (FATE) Group at Microsoft Research Montréal (though she sits with Microsoft Research NYC). Prior to joining the FATE group, she was a Social Good Fellow at the IBM T.J. Watson Research Center, NY. She is interested in how data and methodological limitations delimit what we can learn from online social traces, and how we can make the systems that leverage such data safer, fairer, and generally less biased. The problems she tackles are often motivated by existing societal challenges such as hate speech, racial discrimination, climate change, and disaster relief. Her work has won two best paper awards (WISE 2014, Eurosys' SNS workshop 2012), and has been featured in the UN OCHA's "World Humanitarian Data and Trends" and in popular media outlets, including The Washington Post, VentureBeat, and ZDNet. More recently, she co-authored a survey of biases and methodological pitfalls when working with online social data, and has been co-organizing several tutorials on the topic at a variety of major data mining, and web and social media conferences, including ICWSM, KDD, WSDM, WWW, and SDM. She has also served on the program committees of the main social media and web conferences, including ICWSM, WWW, WebSci, CIKM, and SIGIR, on the steering committee of the new ACM Conference on Fairness, Accountability, and Transparency (FAT*), and as the Tutorial Co-chair for ICWSM 2018 and FAT* 2018. Alexandra holds a PhD (2016) from École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. She draws her experience from academic institutions and research labs across 5 different countries.

Abstract: Data-driven computational systems and studies already make assessments about the physical or mental health of individuals, or about their personality or political views, in order to drive policies, to change behaviors, to shape products and services, and for automated decision making. An important source of data for many of these systems are the ever-growing datasets of online user traces, which promise to offer captivating insights into human phenomena. Alas, some of these systems and studies conjecture that such social datasets are adequate, often as-is, for the problem at hand, with little or no scrutiny. Yet, this is rarely the case.

In this talk, we will challenge such adequacy assumptions, and cover several types of biases and limits that surface when leveraging social datasets, related to both the characteristics of these datasets, as well as of the methods for acquiring and analyzing them. I will focus on identifying, quantifying, or minimizing such risks. Understanding these risks is particularly important when tackling significant societal challenges, where the dichotomy between maximizing benefits and minimizing risks is often more palpable. Thus, I will ground our discussion in several social good applications, such as humanitarian crises, news coverage of climate change, minority issues and advocacy, hate speech, and health. I will also overview domain specific insights to showcase the potential benefits of such applications.