Rating prediction is an important application, and a popular
research topic in collaborative filtering. However, both the
validity of learning algorithms, and the validity of standard
testing procedures rest on the assumption that missing ratings
are missing at random (MAR). In this paper we present the results
of a user study in which we collect a random sample of ratings
from current users of an online radio service.  An analysis of
the rating data collected in the study shows that the sample of
random ratings has markedly different properties than ratings of
user-selected songs. When asked to report on their own rating
behaviour, a large number of users indicate they believe their
opinion of a song \textit{does} affect whether they choose to
rate that song, a violation of the MAR condition. Finally, we
present experimental results showing that incorporating an
explicit model of the missing data mechanism can lead to
significant improvements in prediction performance on the random
sample of ratings.