Dirichlet–multinomial mixture model: known groups
Data
Assuming we are observing a set of documents,
: number of tokens
: number of documents
: vocabulary size
: number of topics
Tokens: ,
Topic of each document: ,
Latent variables
Token distribution of -th topic
Independence:
is indepent of
for
given the topic of the document.
are i.i.d..
Topic distribution
Overall we have
.
Prior
Overall prior
Notation
- topic
is responsible for
documents
- total
tokens in those documents associated with topic
tokens of type
associated with topic
Likelihood
Let denotes the topic for token
for all
where
.
Likelihood is therefore
Evidence
Posterior
Prediction
Consider the case consists of a single token in a new document
For the case of a single token in an existing document
where
is of group
,
the predictive probability is
For new dataset consists of multiple documents
, the predictive probability is