Hyperparameter inference: slice sampling
Hyperparameters of LDA
The hyperparameters of LDA are
,
,
, and
.
In this lecture, we assume is a
-dimensional
uniform distribution, and
is a
-dimensional
uniform distribution.
Also, and
.
In order to compute the posterior distribution, we have to
first choose priors and
for hyperparameters
.
Gamma distribution
- Domain:
- Parameters:
: scale parameter
: shape parameter
- Mean:
- Variance:
- When
as
(a very broad gamma distribution) the distribution can be used as an uninformative prior, which is uniformly distributed over the domain.
Hyperparameter inference
How to draw samples of ,
, and
as a whole?
where the denominator is a normalization constant.
The numerator can be factorized as
We have learned how to perform Gibbs sampling on :
where the shorthand notation and
is used.
Blocked Gibbs sampling
Repeat the following steps
- Sample
by Gibbs sampling, usually for several rounds.
- Sample
from
To sample we have to be able to compute the following
distribution
Notice that this is a continuous distribution, unlike that of .
- Can we compute the denominator?
- Do we need to compute the denominator?
Slice sampling
Slice sampling [1] is applicable when we want to draw a sample
from
but we can only compute
the unnormalized distribution
.
The idea of slice sampling is to sample uniformly under the curve.
Slice sampling
Stepping out
- Evaluation of
is expensive so usually the last two loops are skipped, and make
big enough based on prior knowledge in the beginning.
- In practice, only a limited number of stepping outs are allowed or otherwise it might keep expanding in some rare cases.
Shrinkage
Hyperparameter inference
- The equation sign here is applied to
, so it holds up to a constant factor.
- The second line holds because
is not involved in
and
.
- The third line holds due to the assumption that
and
are uninformative so treated as constants.
Random variable transformation
Consider the transformation function
When is strictly monotone
Change of variables
There is no easy way to draw samples in ,
so we instead consider a monotone mapping
and draw samples from the equivalent distribution in terms of
given by
where the Jacobian is
Therefore can be written as
Similarly,
By change of variable
Multivariate slice sampling
Instead of sampling each variable alternatively conditioned on one another, multivariate slice sampling is available to sample multiple variables in one go.
Multivariate slice sampling
Back to hyperparameter inference
which is the evidence with known topics, see Evidence.
To draw from using multivariate slice
sampling, let
the Jacobian is given by
So