Machine Learning and Friends Lunch





home
past talks
resources
conferences

Optimal Number of Topics in LDA Models


Andrés Corrada-Emmanuel
UMass

Abstract


The Latent Dirichlet Allocation Process (Blei,Ng,Jordan 2003) models a
corpus with a global set of unigram distributions. When applied to a
text corpus, these distributions look like semantic "topics". I'll
describe ongoing work to develop a simple criterion for finding the
"optimal" number of topics using ideas from Information Geometry.

Back to ML Lunch home