Machine Learning and Friends Lunch





home
past talks
resources
conferences

Density Allocation for Modeling Discrete Data


Victor Lavrenko
UMass

Abstract


The talk will discuss statistical techniques for modeling collections of unstructured or semi-structured data. We will begin by discussing the popular approaches to the problem, starting with simple unigram models, extending them to cluster-based mixture models, and moving on to two state-of-the-art latent aspect models: pLSI and LDA. We will then propose a simple generalization of these models: generative density allocation. The new formalism allows us to gain an intuition for the relative strengths and weaknesses of the popular models, and, more importantly, it allows us to develop a new generative model based on non-parametric density estimates. We will look at two variants of the model -- one based on the Dirac-delta kernel (known as Relevance Models), and one based on the Dirichlet kernel. We will discuss how the new model can be applied to the problems of web-search, cross-language retrieval, topic detection and tracking, recognition of objects in images and recognition of hand-written words. In each case the new model consistently outperforms state-of-the-art baselines.

Back to ML Lunch home