Probabilistic topic models are often used to analyze and extract semantic topics from large text collections. In this talk I will first introduce a two-layer undirected graphical model, called a Replicated Softmax, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. I will present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. I will further demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

In the second part of the talk I will introduce a class of probabilistic generative models called Deep Belief Networks that contain many layers of latent variables with the bottom layer forming a Replicated Softmax model. I will then show that the resulting deep graphical model is able to both discover meaningful semantic topics and learn latent representations that work much better for document retrieval.

Retrieved from https://people.cs.umass.edu/~mlfriend/pmwiki/pmwiki.php?n=Main.LearningUndirectedTopicModels

Page last modified on March 22, 2010, at 11:08 AM