CS 335: Matrix Factorization for Movie Recommendations

Dan Sheldon

Movie Recommendations

Gladiator Silence of the Lambs WALL-E Toy Story
Alice 5 4 1
Bob 5 2
Carol 5
David 5 5
Eve 5 4

What movie should I recommend to Bob?
Will Carol like WALL-E?

Goal: Fill in entries of the “rating matrix”

Problem Setup

We only get to see some of the entries of the rating matrix and want to fill in the rest.

Our data is a list of \(L\) ratings specified as follows:

Example: in our original example we observed \(L=10\) ratings

\(k\) \(i_k\) \(j_k\) \(r(i_k, j_k)\) Comment
1 1 1 5 (Alice, Gladiator, 5)
2 1 2 4 (Alice, Silence, 4)
3 1 3 1 (Alice, WALL-E, 1)
4 2 2 5 (Bob, Silence, 5)
… . ..
9 5 1 5 (Eve, Gladiator, 5)
10 5 2 4 (Eve, Silence, 4)

Matrix Factorization Model

Assume each user has an unknown weight vector \(\mathbf{u}_i \in \mathbb{R}^d\) and each movie has an unknown weight vector \(\mathbf{v}_j \in \mathbb{R}_d\).

The predicted rating is

\[ h(i,j) = u_{i1}v_{j1} + u_{i2} v_{j2} + \ldots + u_{id} v_{jd} = \mathbf{u}_i^T \mathbf{v}_j \]


Unlike previous problems we don’t observe the features or weights, and need to learn them both from the observed ratings.


Learning problem

Find parameters such that \(h(i_k, j_k) = \mathbf{u}_{i_k}^T \mathbf{v}_{j_k} \approx r(i_k, j_k)\) for \(k = 1,\ldots,L\) (the training data) and take appropriate measures to not overfit.

Why is This Called Matrix Factorization?

Your Job: Solve the Learning Problem



Data and Code

Link to starter code and data

Futher Reading

Matrix Factorization Techniques for Recommender Systems by Yehuda Koren, Robert Bell and Chris Volinsky

Model Extensions

Once you nail the matrix factorization model, here are some ideas to get even better performance.

Biases only baseline

A simpler model that helps introduce important ideas is the “biases” only model. This has an overall baseline score \(\mu\) and an offset (or “bias”) \(a_i\) for each user as well as a bias \(b_j\) for each movie. The model is:

\[ h(i,j) = \mu + a_i + b_j \]

For example


To learn these parameters, write down the partial derivatives of the cost function with respect \(\mu\), \(a_i\), and \(b_j\) and plug them into stochastic gradient descent.

Matrix Factorization + Biases

The biases only model can be incorporated into the matrix factorization model to improve performance:

\[ h(i,j) = \mu + a_i + b_j + \mathbf{u}_i^T \mathbf{v}_j \]


To learn these parameters, combine the partial derivatives from the basic matrix factorization model with those from the biases only model and update them all within stochastic gradient descent.