ideological books corpus
The Ideological Books Corpus (IBC) consists of 4,062 sentences annotated for political ideology at a sub-sentential level as described in our paper. Specifically, it contains 2025 liberal sentences, 1701 conservative sentences, and 600 neutral sentences. Each sentence is represented by a parse tree where annotated nodes are associated with a label in {liberal, conservative, neutral}.
A 150-sentence sample of the data can be found here, along with a Python script that shows how to access the sentences, phrases, and annotations.
To obtain the full dataset, or for any questions / comments about the data, please send me an email at miyyer@umd.edu.
If you use the IBC in your research, please cite the original IBC paper in addition to ours (e.g., "we used the Ideological Books Corpus (Sim et al., 2013) with sub-sentential annotations (Iyyer et al., 2014) for our work..."):
- Yanchuan Sim, Brice Acree, Justin Gross, and Noah Smith. Measuring Ideological Proportions in Political Speeches.Empirical Methods in Natural Language Processing, 2013.