Machine Learning and Friends Lunch





home
past talks
resources
conferences

Generative Classification Models for Information Retrieval


Ramesh Nallapati
UMass

Abstract


Information Retrieval deals typically with retrieving objects such as documents relevant to a user's information need, from a large collection. IR Researchers have developed several probabilistic approaches for this problem, such as the Binary Independence Retrieval (BIR) model of the 70's to the more recent language models and the Relevance model.

However, some of these models are plagued by problems such as absence of a unifying framework in various settings such as simple query-based retrieval, relevance feedback and pseudo-relevance feedback. Additionally, the graphical representations of these models do not entirely explain their parameter estimation or ranking functions.

In this work, we consider IR as a binary classification problem in the framework of the BIR model. We show that the simple framework allows us to model the various settings mentioned earlier in a unified manner. Our parameter estimation and the ranking functions are based on the EM algorithm and follow directly from the graphical representation. Preliminary experiments show that our new model using a smoothed-Dirichlet class conditional achieves promising results in ad-hoc retrieval and Topic-Tracking.

Back to ML Lunch home