Machine Learning and Friends Lunch

Neighbourhood Components Analysis

Abstract

Say you want to do K-Nearest Neighbour classification. Besides
selecting K, you also have to chose a distance function, in order to
define "nearest". I'll talk about a novel method for *learning* a
Mahalanobis distance measure to be used in the KNN classification
algorithm from the data itself. The algorithm, Neighbourhood
Components Analysis (NCA) directly maximizes a stochastic variant of
the leave-one-out KNN score on the training set. It can also learn a
low-dimensional linear embedding of labeled data that can be used for
data visualization and very fast classification in high dimensions. Of
course, the resulting classification model is non-parametric, making
no assumptions about the shape of the class distributions or the
boundaries between them. If time permits, I'll also talk about newer
work on learning the same kind of distance metric for use inside a
Gaussian Kernel SVM classifier.

(Joint work with Jacob Goldberger)

Back to ML Lunch home