UMass Machine Learning and Friends Lunch | Main / Beyond Keywords Finding Information More Accurately And Easily Using Natural Language

While keywords are both intuitive and effective for performing simple navigational and informational web search (e.g. alaska airlines, american revolution, etc.), not all information needs are so simple. For example, on community question answering sites we find questions like "How have dramatic shifts in terrorists resulted in an equally dramatic shift in terrorist organizations?" or "Are concerns raised by the media justified about global warming and stem cell research?" While natural language (NL) allows users to easily express such arbitrarily complicated queries, search engines generally perform poorly on NL queries. On the other hand, while a more effective keyword query usually exists for retrieving the desired information, finding effective keywords for complex questions is often difficult for users. Consequently, supporting automatic search for such questions remains an open challenge.

I adopt the approach of allowing people to naturally express their questions and investigate how automatic retrieval for such questions can be improved. To this end, I describe a learning framework for better estimating traditional term-based retrieval models by building on recent ideas from "learning to rank". Given examples of queries and their relevant documents, the model learns to predict effective context-sensitive term weights for NL queries, and the feature space can be incrementally extended from terms to modeling term interactions or other latent representations. Empirical evaluation shows that better estimating the relative importance of NL terms to each query's core information need allows retrieval accuracy improvements to be achieved across several datasets.