UMass Machine Learning and Friends Lunch | Main / Modeling Reformulation Using Query Distributions

Abstract: Query reformulation modifies the original query with the aim of better matching the vocabulary of the relevant documents, and consequently improving ranking effectiveness. Previous models typically generate words and phrases related to the original query, but do not consider how these words and phrases would fit together in actual queries. In this paper, a novel framework is proposed that models reformulation as a distribution of actual queries, where each query is a variation of the original query. An implementation of this framework that only uses publicly available resources is proposed, which makes fair comparisons with other methods using TREC collections possible. Specifically, this implementation consists of a query generation step that analyzes the passages containing query words to generate reformulated queries and a probability estimation step that learns a distribution for reformulated queries by optimizing the retrieval performance. Experiments on TREC collections show that the proposed model can significantly outperform previous reformulation models.

Bio: Xiaobing Xue is PhD candidate from the Center for Intelligent Information Retrieval (CIIR). He is broadly interested in information retrieval, natural language processing and large-scale machine learning as well as their practical applications. His current research focuses on query reformulation, which modifies the original query with the aim of better matching the vocabulary of relevant documents and consequently improving the relevance of search systems. His previous research includes question and answer retrieval, patent retrieval, multi-modal retrieval and text categorization.