home
past talks
resources
conferences
|
|
|
Optimal Number of Topics in LDA Models
Andrés Corrada-Emmanuel
UMass
Abstract
The Latent Dirichlet Allocation Process (Blei,Ng,Jordan 2003) models a
corpus with a global set of unigram distributions. When applied to a
text corpus, these distributions look like semantic "topics". I'll
describe ongoing work to develop a simple criterion for finding the
"optimal" number of topics using ideas from Information Geometry.
Back to ML Lunch home
|