Machine Learning and Friends Lunch |
||||
|
Pachinko Allocation: DAG-Structured Mixture Models of Topic CorrelationsWei Li UMass Abstract
Latent Dirichlet allocation (LDA) and other related topic models are
increasingly popular tools for summarization and manifold discovery in
discrete data. However, LDA does not capture correlations between
topics. In this paper, we introduce the pachinko allocation model (PAM),
which captures arbitrary, nested, and possibly sparse correlations
between topics using a directed acyclic graph (DAG). The leaves of the
DAG represent individual words in the vocabulary, while each interior
node represents a correlation among its children, which may be words or
other interior nodes (topics). PAM provides a flexible alternative to
recent work by Blei and Lafferty (2006), which captures correlations
only between pairs of topics. Using text data from newsgroups, historic
NIPS proceedings and other research paper corpora, we show improved
performance of PAM in document classification, likelihood of held-out
data, the ability to support finer-grained topics, and topical keyword
coherence.
|