Hyperparameter inference: slice sampling
Hyperparameters of LDA
The hyperparameters of LDA are , , , and .
In this lecture, we assume is a -dimensional uniform distribution, and is a -dimensional uniform distribution.
Also, and .
In order to compute the posterior distribution, we have to first choose priors and for hyperparameters .
Gamma distribution
- Domain:
- Parameters:
- : scale parameter
- : shape parameter
- Mean:
- Variance:
- When as (a very broad gamma distribution) the distribution can be used as an uninformative prior, which is uniformly distributed over the domain.
Hyperparameter inference
How to draw samples of , , and as a whole?
where the denominator is a normalization constant. The numerator can be factorized as
We have learned how to perform Gibbs sampling on :
where the shorthand notation and is used.
Blocked Gibbs sampling
Repeat the following steps
- Sample by Gibbs sampling, usually for several rounds.
- Sample from
To sample we have to be able to compute the following distribution
Notice that this is a continuous distribution, unlike that of .
- Can we compute the denominator?
- Do we need to compute the denominator?
Slice sampling
Slice sampling [1] is applicable when we want to draw a sample from but we can only compute the unnormalized distribution .
The idea of slice sampling is to sample uniformly under the curve.
Slice sampling
Stepping out
- Evaluation of is expensive so usually the last two loops are skipped, and make big enough based on prior knowledge in the beginning.
- In practice, only a limited number of stepping outs are allowed or otherwise it might keep expanding in some rare cases.
Shrinkage
Hyperparameter inference
- The equation sign here is applied to , so it holds up to a constant factor.
- The second line holds because is not involved in and .
- The third line holds due to the assumption that and are uninformative so treated as constants.
Random variable transformation
Consider the transformation function
When is strictly monotone
Change of variables
There is no easy way to draw samples in , so we instead consider a monotone mapping and draw samples from the equivalent distribution in terms of given by
where the Jacobian is
Therefore can be written as
Similarly,
By change of variable
Multivariate slice sampling
Instead of sampling each variable alternatively conditioned on one another, multivariate slice sampling is available to sample multiple variables in one go.
Multivariate slice sampling
Back to hyperparameter inference
which is the evidence with known topics, see Evidence.
To draw from using multivariate slice sampling, let the Jacobian is given by
So