Dirichlet–multinomial mixture model: exploration and prediction
Recap
: tokens in document
The posterior of Dirichlet–multinomial mixture model is given by
where
is composed of
Dirichlet distributions.
The other term can be obtained by
in which can be computed using
Gibbs sampling.
Exploration
where
For a single component of
For the intractable sums, we can approximate that using sampling technique:
draw samples by
Gibbs sampling.
Can we do the following Monte Carlo approximation?
No, because of label switching.
Label switching
Label switching happens for several reasons
permutations for an identical mode
- multiple modes
Label switching happens also in a single Gibbs run; otherwise, the Markov chain is not mixing well.
Dealing with label switching
- Matching up topics: very hard in practice. It is even harder when the distribution is multi-modal.
- Averaging samples within a small region in the Markov chain: there is no guarantee that label switching does not happen on those samples.
- Using Only one sample, in particular, the sample with highest probability.
In our case, we use a single sample to estimate the latent parameter:
Prediction
Alternatively,
Single new token
where .
The Monte Carlo method here is not susceptible to label switching.
Each
is an approximation of a probability
even if the component
for a particular
is susceptible to label switching.
Multiple new tokens
where .
where
for
.
Overall, the predictive probability is given by
are constants in a Gibbs sampling run
- Not susceptible to label switching
- Computationally expensive: in practice only one sample
(the one with the highest probability) is used, that is,