Partially Labeled Topic Models For Interpretable Text Mining
Partially Labeled Topic Models for Interpretable Text Mining
From Twitter to academic publications, information technologies have enabled companies, organizations, and governments to collect huge datasets about the world, often with large textual components. Social scientists are interested in mining these datasets to improve their understanding of the world, but need tools to discover and quantify interpretable, trustworthy patterns in the data. In particular, these tools should discover textual trends that align with labels, tags, or other known categories of interest, when they are available. I will present a series of probabilistic models of metadata-enriched text, designed to enable big picture interpretations of text datasets that contain human interpretable metadata. I will present an analysis of Twitter using these techniques and an ongoing study of innovation in academia, as manifested in over one million PhD dissertation abstracts, mine not yet included.