Efficient Training of Additive Classifiers

This page contains the code and details of the paper:

Max-Margin Additive Classifiers for Detection.
Subhransu Maji and Alexander C. Berg
In Proceedings, ICCV 2009, Kyoto, Japan.
pdf

In this paper we propose to train discriminative additive models which mimimize the hingle loss on the training data + regularization. Standard l2 regularization (w'w), lets one use a standard linear solver like LIBLINEAR or PEGASOS to train, while using the regularization proposed in our paper (w'Hw), requires the custom solver we build. All these techniques approximate the classifier that can be learned using standard kernel SVM training using the min (or histogram intersection) kernel. If you train a kernel SVM using the min kernel you may still use the fast prediction technique proposed in our earlier paper during test.
Note that min is a conditionally positive definite kernel for all features and can be used with kernel SVMs even when the features are not positive (see proof in the paper).

Code for encoding + training piecewise linear models:

Source code : pwl_sgd.tar.gz
Computes the phi_1 and phi_2 encoding proposed in the paper
Output is sparse so can be used directly with LIBLINEAR
Minimizes hinge loss + w'Hw, using modified PEGASOS.
Example toy dataset experiments included.

Code for running experiments on Caltech 101 dataset

Source code : ct101.tar.gz
Precomputed features + code for benchmarking on dataset
Requires LIBLINEAR & PWLSGD (pwl_sgd.tar.gz)

Code for running experiments on DC pedestrian dataset:

Source code : iccv09-dc-ped.tar.gz
Code for computing features/running benchmarks on dataset
Requires LIBLINEAR & PWLSGD (pwl_sgd.tar.gz)

FAQ

How does training time compare to training a linear SVM?
Training with the l2 regularization (w'w) is within a small constant of the training time of a linear SVM as the encoding step produces features that have atmost twice the number of non zero entires of the raw features. With the modified reqularization (w'Hw) our custom solver takes slightly longer. All these are often orders of magnitude less than training a standard svm using LIBSVM and the min kernel.
Is there a loss of accuracy over min kernel SVM?
Yes, but they are small and the savings in training time may justify it. Its valuable for experimenting with features and other hyperparameters. The gains in accuracy over a linear SVM can be very significant as we conclude in many experiments in our paper.
When should we use it?
Always. If you can afford to train a linear SVM you should be able to train our models. The test times are similar as the encoding step is fast. Typically our classifiers are only 5-6 times slower than a linear SVM during test time.
What was your CVPR'08 paper about?
We just addressed fast testing for min (or general additive) kernels SVMS. You may still use that method during test for SVM models trained using min kernel. This paper uses the ideas of the piecewise linear approximation of the earlier paper to do both training and testing efficiently.