Machine Learning and Friends Lunch |
||||
|
Beyond Bags of Words: A Markov Random Field Model for Information RetrievalDon Metzler UMass Abstract
Current state of the art information retrieval models treat
documents and
queries as bags of words. There have been many attempts to go
beyond this
simple representation. Unfortunately, few have shown consistent
improvements in retrieval effectiveness across a wide range of
tasks and
data sets. Here, we propose a new statistical model for
information
retrieval based on Markov random fields. The proposed model goes
beyond
the bag of words assumption by allowing dependencies between
terms to be
incorporated into the model. This allows for a variety of
textual and
non-textual features to be easily combined under the umbrella of
a single
model. Within this framework, we explore the theoretical issues
involved,
parameter estimation, feature selection, and query expansion. We
give
experimental results from a number of information retrieval
tasks, such as
ad hoc retrieval and web search.
|