Machine Learning and Friends Lunch

Regularizing Ad Hoc Information Retrieval Scores

Abstract

Ad hoc information retrieval refers to classifying a collection of documents as relevant and non-relevant given a short query and no example documents. This is one of the most well known text classification problems with a rich evaluation suite. Despite this, ad hoc retrieval remains the least studied in the context of machine learning. This talk will begin by placing ad hoc retrieval in the context of machine learning research. I will attempt to draw connections to related work in machine learning and point out areas of potential research.

The second part of this talk will focus on a specific case where the tools from machine learning---specifically, manifold regularization---have helped understand and improve established results from information retrieval. We demonstrate that regularized scores consistently and significantly rank documents better than un-regularized scores, given a variety of initial retrieval algorithms. If time permits, I will touch on proposed extensions to this work.

Back to ML Lunch home