16 Recommender Systems

Problem Formulation

an important application of machine learning
this idea of learning the features

Content Based Recommendations

user j ‘s parameter vector : [latex]\theta^{(j)}[/latex] movie i’s feature vector : [latex]x^{(i)}[/latex] Predicting rating: [latex](\theta^{(j)})^Tx^{(i)}[/latex] user [latex]j[/latex] ‘s cost function : [latex]\underset {\theta ^{(j)}}{min} \frac {1}{2} \sum _{i:r(i,j)=1} ((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+ \frac {\lambda}{2}(\theta_k^{(j)})^2[/latex] all user’s cost function : [latex]\underset {\theta ^{(j)}, \cdots , \theta ^{(n_u)}}{min} \frac {1}{2} \sum _{j=1}^{n_u} \sum _{i:r(i,j)=1} ((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+ \frac {\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2[/latex] **Gradient descent : ** [latex]\left\{\begin{matrix} \theta_k^{(j)} := \theta_k^{(j)} - \alpha \sum_{i:r(i,j)=1} ((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})x_{k}^{(i)} & \ (for \ k \ = \ 0) \\ \theta_k^{(j)} := \theta_k^{(j)} - \alpha (\sum_{i:r(i,j)=1} ((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})x_{k}^{(i)} + \lambda \theta_{k}^{(j)}) & \ (for \ k \ \neq \ 0) \end{matrix}\right.[/latex]

Collaborative Filtering

No User’s Parameter and No Movie’s Features, you can do this. The term collaborative filtering refers to the observation that when you run this algorithm with a large set of users, what all of these users are effectively doing are sort of collaboratively or collaborating to get better movie ratings for everyone because with every user rating some subset with the movies, every user is helping the algorithm a little bit to learn better features, and then by helping– by rating a few movies myself, I will be helping the system learn better features and then these features can be used by the system to make better movie predictions for everyone else. And so there is a sense of collaboration where every user is helping the system learn better features for the common good. This is this collaborative filtering.

Collaborative Filtering Algorithm

Initialize x and theta to small random values
minimize the cost function using great intercepts or one of the advance optimization algorithms
predict

Vectorization Low Rank Matrix Factorization

A user has recently been looking at one product. Are there other related products that you could recommend to this user? If you can find a different movie i, j, so that the distance between [latex]x^{(i)}[/latex] and [latex]x^{(j)}[/latex] is small, then this is a pretty strong indication that, you know, movies j and i are somehow similar Use learned features to find what might be movies and what might be products that aren’t related to each other.

Implementational Detail Mean Normalization

If a user has not evaluated any movies, which movie should we recommend? The idea of mean normalization will let us fix this problem.