Machine Learning and Friends Lunch





home
past talks
resources
conferences

Resource-Bounded Information Gathering for Entity Resolution


Pallika Kanani
UMASS

Abstract

The goal of entity resolution is to identify and reconcile references to the same real world entity. Entity resolution for authors in the citation analysis domain suffers from the problem of insufficient information. We augment our author coreference model with additional evidence obtained by querying the web and show significant improvements in accuracy and confidence. The problem is formulated as partitioning a weighted, undirected, fully connected graph, where each partition represents the set of citations by one author. Under the constrain of resources, we need an efficient procedure to select a subset of the queries and still achieve significant performance improvement. It is important to consider the overall graph structure in order to make this selection optimal. Next, we expand the graph by adding additional mentions obtained from the web and partition it, so as to improve the performance on the original graph. We formally describe the problem of resource bounded information gathering in each of these contexts, and also explore some heuristic based solutions.

Back to ML Lunch home