CMPSCI Theory Seminar

Joint meeting with BIGIALS Seminar

(Bioinformatics, Genomics, and Interdisciplinary

Approaches to the Life Sciences on Systems Biology)

A stability based method for discovering structure in clustered data

Asa Ben-Hur

Stanford University, Dept. of Biochemistry

26 November 2002

4:00 p.m., Room 140 Computer Science Building

Most clustering algorithms provide a clustering of a dataset regardless of whether the data actually has cluster structure or not. To address this issue, we present a method for assessing the presence of structure in clustered data. The method is based on the idea that a "good" clustering should be stable under perturbations of the data. We characterize stability using the a similarity measure between a reference clustering and clusterings obtained from sub-samples of the data. High similarities indicate a stable clustering pattern. We argue that stability is a desirable feature of a clustering solution that implies the existence of cluster structure.

The proposed method can be used with any clustering algorithm; it provides a means of rationally defining an optimum number of clusters, choosing various aspects of the clustering algorithm, and can also detect the lack of structure in data. We show results on several datasets using a hierarchical clustering algorithm, and demonstrate with the method that using a few leading principal components enhances cluster structure.

After completing a PhD in information systems from the Industrial Engineering faculty of the Technion, Asa joined a bioinformatics startup - BioWolf. Following its bankruptcy he has recently joined the Brutlag bioinformatics group at Stanford as a postdoc.

Refreshments at 3:30 PM in the Atrium, outside the presentation room.










Last modified 18 November 2002