Learning, inference, and prediction in the presence of missing
data are pervasive problems in machine learning and statistical
data analysis. This thesis focuses on the problems of
collaborative prediction with non-random missing data and
classification with missing features. We begin by presenting and
elaborating on the theory of missing data due to Little and
Rubin. We place a particular emphasis on the missing at random
assumption in the multivariate setting with arbitrary patterns of
missing data. We derive inference and prediction methods in the
presence of random missing data for a variety of probabilistic
models including finite mixture models, Dirichlet process mixture
models, and factor analysis.

Based on this foundation, we develop several novel models and
inference procedures for both the collaborative prediction
problem and the problem of classification with missing features.
We develop models and methods for collaborative prediction with
non-random missing data by combining standard models for complete
data with models of the missing data process. Using a novel
recommender system data set and experimental protocol, we show
that each proposed method achieves a substantial increase in
rating prediction performance compared to models that assume
missing ratings are missing at random.

We describe several strategies for classification with missing
features including the use of generative classifiers, and the
combination of standard discriminative classifiers with single
imputation, multiple imputation, classification in subspaces, and
an approach based on modifying the classifier input
representation to include response indicators. Results on real
and synthetic data sets show that in some cases performance gains
over baseline methods can be achieved by methods that do not
learn a detailed model of the feature space.