Hyperparameter optimization
- Hyperparameters of LDA are , , ,
and .
- : -dimensional
- : -dimensional
- Previously assume symmetric Dirichlet prior is used.
- Relax the symmetric assumption and consider asymmetric Dirichlet prior.
- how to identify the optimal hyperparameters according to certain criterion?
We will be working on and as a whole because they always occur together.
Assumption: are drawn iid from certain prior for all , and so are .
Gamma distribution
in the limit when when , the Gamma distribution will become uniform over .
Posterior
Hyperparameters optimization
Instead of sampling values from posterior , maximize the posterior probability itself.
It follows an EM-like framework:
repeatsampleoptimize , using previous
Notation:
Objective:
maximize
where is only involved in , and is only involved in . So we can alternatively optimize and until convergence.
Note that is concave in , which means it will converge to the global optimal value.
Fixed-point iteration
Minka’s fixed-point iteration [1] is a fast algorithm to optimize the hyperparameters of LDA.
Bound 1
For any and
Bound 2
For any and
where .
Fixed-point iteration for hyperparameters of LDA
Supposing is the optimal parameters, it follows that
How to approximate using the recurrence relationship of the gamma function?
Remarks
- Run a few iterations only: use optimization as a proxy for sampling
- Use asymmetric model: stop words will stand out
- Workaround: fix and to uniform distribution and still use fixed-point iteration to optimize and .