Efficient Training of Additive Classifiers
This page contains the code and details of the paper:
Max-Margin Additive Classifiers for Detection.
Subhransu Maji and Alexander C. Berg
In Proceedings, ICCV 2009, Kyoto, Japan.
In this paper we propose to train discriminative additive models which
mimimize the hingle loss on the training data +
regularization. Standard l2 regularization (w'w), lets one use a standard
linear solver like
to train, while using the
regularization proposed in our paper (w'Hw), requires the custom
solver we build. All these techniques approximate the classifier that
can be learned using standard kernel SVM training using the min (or
histogram intersection) kernel. If you train a kernel SVM using the min kernel you
may still use the fast prediction technique proposed in
paper during test.
Code for encoding + training piecewise linear models:
Note that min is a conditionally positive definite kernel for all
features and can be used with kernel SVMs even when the features are
not positive (see proof in the paper).
Code for running experiments on Caltech 101 dataset
Code for running experiments on
DC pedestrian dataset:
- Source code : pwl_sgd.tar.gz
- Computes the phi_1 and phi_2 encoding proposed in the paper
- Output is sparse so can be used directly with LIBLINEAR
- Minimizes hinge loss + w'Hw, using modified PEGASOS.
- Example toy dataset experiments included.
- How does training time compare to training a linear SVM?
Training with the l2 regularization (w'w) is within a small
constant of the training time of a linear SVM as the encoding step
produces features that have atmost twice the number of non zero
entires of the raw features. With the modified reqularization
(w'Hw) our custom solver takes slightly longer. All these are
often orders of magnitude less than training a standard svm using
LIBSVM and the min kernel.
- Is there a loss of accuracy over min kernel SVM?
Yes, but they are small and the savings in training time may justify it. Its valuable
for experimenting with features and other hyperparameters. The
gains in accuracy over a linear SVM can be very significant as we
conclude in many experiments in our paper.
- When should we use it?
Always. If you can afford to train a linear SVM you should be able
to train our models. The test times are similar as the encoding step
is fast. Typically our classifiers are only 5-6 times slower than a
linear SVM during test time.
- What was your CVPR'08 paper about?
We just addressed fast testing for min (or general additive) kernels
may still use that method during test for SVM models trained using min
kernel. This paper uses the ideas of the piecewise linear
approximation of the earlier paper to do both training and testing