Dirichlet–multinomial mixture model: exploration and prediction
Recap
- : tokens in document
The posterior of Dirichlet–multinomial mixture model is given by
where is composed of Dirichlet distributions. The other term can be obtained by
in which can be computed using Gibbs sampling.
Exploration
where
For a single component of
For the intractable sums, we can approximate that using sampling technique: draw samples by Gibbs sampling.
Can we do the following Monte Carlo approximation?
No, because of label switching.
Label switching
Label switching happens for several reasons
- permutations for an identical mode
- multiple modes
Label switching happens also in a single Gibbs run; otherwise, the Markov chain is not mixing well.
Dealing with label switching
- Matching up topics: very hard in practice. It is even harder when the distribution is multi-modal.
- Averaging samples within a small region in the Markov chain: there is no guarantee that label switching does not happen on those samples.
- Using Only one sample, in particular, the sample with highest probability.
In our case, we use a single sample to estimate the latent parameter:
Prediction
Alternatively,
Single new token
where .
The Monte Carlo method here is not susceptible to label switching. Each is an approximation of a probability even if the component for a particular is susceptible to label switching.
Multiple new tokens
where .
where for .
Overall, the predictive probability is given by
- are constants in a Gibbs sampling run
- Not susceptible to label switching
- Computationally expensive: in practice only one sample (the one with the highest probability) is used, that is,