Hyperparameter optimization
- Hyperparameters of LDA are
,
,
, and
.
:
-dimensional
:
-dimensional
- Previously assume symmetric Dirichlet prior is used.
- Relax the symmetric assumption and consider asymmetric Dirichlet prior.
- how to identify the optimal hyperparameters according to certain criterion?
We will be working on and
as a whole because they always occur together.
Assumption: are drawn iid from certain prior
for all
, and so are
.
Gamma distribution
in the limit when when
,
the Gamma distribution will become uniform over
.
Posterior
Hyperparameters optimization
Instead of sampling values from posterior
,
maximize the posterior probability itself.
It follows an EM-like framework:
repeatsampleoptimize,
using previous
Notation:
Objective:
maximize
where is only involved in
, and
is only involved in
.
So we can alternatively optimize
and
until convergence.
Note that is concave in
,
which means it will converge to the global optimal value.
Fixed-point iteration
Minka’s fixed-point iteration [1] is a fast algorithm to optimize the hyperparameters of LDA.
Bound 1
For any and
Bound 2
For any and
where .
Fixed-point iteration for hyperparameters of LDA
Supposing is the optimal parameters, it follows that
How to approximate using the recurrence relationship of
the gamma function?
Remarks
- Run a few iterations only: use optimization as a proxy for sampling
- Use asymmetric model: stop words will stand out
- Workaround: fix
and
to uniform distribution and still use fixed-point iteration to optimize
and
.