Can we use computation to study society? As computing appears everywhere in daily life, computational techniques could help us understand key social scientific questions. But also, since computing is becoming more social, insights from social science may help us design better systems for users.
This seminar will consist of readings and presentations on (1) social media analysis, and (2) computational social science. Social media is one interesting and recent manifestation of computation in everyday life, and lends itself to studies on topics from mental health to the evolution of slang to the emergence of fads to the dynamics of social unrest. This data’s richness and magnitude (“Big”-ness) requires non-trival computational methods for analysis, and comes with major questions about validity and representativeness. At the same time, we will investigate the social science literature and consider techniques and insights that may help achieve a deeper understanding of these phenomena, and better appreciate the important questions to ask.
Furthermore, we will look at case studies of applying computational methods (like text analysis / NLP, network analysis, latent-variable statistical models, agent-based simulations, etc.) to understand social phenomena in other contexts. These have been developed in many other areas than just CS. For example, in the 1980’s, political scientists developed what we would now call an unsupervised machine learning method to infer the ideological positions of legislators from their voting behavior, in order to answer questions about the structure of American politics. Work like this might inspire us to think of how related methods aid insight in other areas.
Readings
1/21: Overview.
Lazer et al. 2009. Computational Social Science. Science. [pdf]
Justin Grimmer, 2015. We’re All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together. PS: Political Science & Politics. [pdf][journal link]
The above two papers represent a conciliatory view on the relationship between social science and Data That Is Large. Some others in this vein include:
Grimmer’s article is one of several articles from a symposium on Big Data, Causal Inference, and Formal Theory: Contradictory Trends in Political Science?, in PS: Political Science & Politics, 2015. [link to several articles]
Yet another overview journal issue just came out: Toward Computational Social Science: Big Data in Digital Environments, in The Annals of the American Academy of Political and Social Science, May 2015; 659 (1). [link to several articles]
Hanna Wallach, 2014. Big Data, Machine Learning, and the Social Sciences. [web]
Ruths and Pfeffer, 2014. Social media for large studies of behavior. [pdf][web]
Einav and Levin. 2014. The data revolution and economic analysis. [pdf]
Wilkerson, Smith, Stramp, 2015. Tracing the Flow of Policy Ideas in Legislatures: A Text Reuse Approach. [pdf]
Clinton, Jackman, Rivers, 2004. The Statistical Analysis of Roll Call Data. [pdf]
Poole and Rosenthal, 1989. Color Animation of Dynamic Congressional Voting Models. [pdf (22M)]
Note: while the first reading is standalone, the second and third readings are both about latent-space models of roll-call voting. CJR is in-depth about several related models. The third is from the older original line of work by P&R; we’ll treat this reading as optional, but it might be easier to give a quick read, because it has more problem/data discussion and visualizations of how the model works (though it’s somewhat different from CJR, as CJR discusses).
2/4 - Sentiment analysis in social media - Jack and Lucas presenting
Kiritchenko et al., 2014. Sentiment Analysis of Short Informal Texts. JAIR. [pdf]
De Choudhury et al., 2013. Predicting Postpartum Changes in Emotion and Behavior via Social Media. CHI. [pdf]
2/11 - Social indicators - Dan presenting
Paul and Dredze, 2012. Discovering Health Topics in Social Media Using Topic Models. PLOS ONE. [link]
Mitchell and Hitlin, 2013. Twitter Reaction to Events Often at Odds with Overall Public Opinion. Pew Research. [web article]
There are many other articles on this topic. Just restricting attention to a few not mentioned in the Gayo-Avello review:
Diaz et al. 2014. Online and social media data as a flawed continuous panel survey. [pdf]
Lampos, 2012. On voting intentions inference from Twitter content: a case study on UK 2010 General Election. [arxiv link]
not sure if this is superceded by the ACL paper cited in Gayo-Avello
Huberty, 2013. Multi-cycle forecasting of Congressional elections with social media. PLEAD workshop. [pdf]
Beauchamp, 2014. Predicting and Interpolating State-level Polling using Twitter Textual Data. MPSA. [pdf]
2/25 - Relationships - Ben presenting about online dating; Brendan on regression
Readings on linear and logistic regression, from Gelman and Hill, 2007, “Data Analysis Using Regression and Multilevel/Hierarchical Models”. Read all of Chapter 3 (linear regression), and then 5.1-5.5 (logistic regression).
Anderson et al. 2014, Political Ideology and Racial Preferences in Online Dating. Sociological Science. [pdf][supplement].
Shalizi and Thomas, 2010. Homophily and Contagion Are Generically Confounded in Observational Social Network Studies. Sociol Methods Res. 2011 May; 40(2): 211–239. [arxiv version]