Collaborative Filtering

Consider a list of movies as "items". Each movie has an associated rating which we will call "score". With this, we write the following denotion:

s(j) = score \hspace{0.1cm} for \hspace{0.1cm} item \hspace{0.1cm} j \\ where, \hspace{0.1cm} j = 1 \hspace{0.1cm} ... \hspace{0.1cm} M, \hspace{0.1cm} if \hspace{0.1cm} there \hspace{0.1cm} are \hspace{0.1cm} M \hspace{0.1cm} items.

Basic algorithm is to make average rating for item $j$ :

s(j) = \frac{\sum_{i\isin\Omega_j} r_{i,j}}{|\Omega_j|} \\ where, \Omega_j \hspace{0.1cm} \equiv set \hspace{0.1cm} of \hspace{0.1cm} all \hspace{0.1cm} users \hspace{0.1cm} who \hspace{0.1cm} rated \hspace{0.1cm} item \hspace{0.1cm} j \\ r_{i,j} \equiv rating, \hspace{0.1cm} user \hspace{0.1cm} i \hspace{0.1cm} gave \hspace{0.1cm} item \hspace{0.1cm} j

Score Personalisation

Score $s(i, j)$ can depend on both user $i$ and item (e.g. movie) $j$ .

s(i,j) = \frac{\sum_{i^` \isin \Omega_j} r_{i^`j}}{|\Omega_j|} \\ where \hspace{0.1cm} i = 1...N, \hspace{0.5cm} N \equiv No. \hspace{0.1cm} of \hspace{0.1cm} users

The above equation does not change anything from the equation to calculate the average rating as the score still does not depend on user $i$ . So, every user still sees the same score for each item.

The above equation is actually meant to introduce a new convention and symbols which we would modify later on to introduce personalisation. We would observe this in an upcoming blog article.

Ratings Matrix

The central object for any recommender systems is the ratings matrix. It is defined as follows:

r_{i,j} = rating \hspace{0.1cm} user \hspace{0.1cm} I \hspace{0.1cm} gave \hspace{0.1cm} item \hspace{0.1cm} j \\ where \hspace{0.1cm} i = 1...N, \hspace{0.1cm} j = 1...M \\ R_{N \times M} \equiv user-item \hspace{0.1cm} ratings \hspace{0.1cm} matrix \hspace{0.1cm} of \hspace{0.1cm} size \hspace{0.1cm} N \times M

Sparsity

User-item ratings matrix is sparse because most entries are generally empty. Below is an example of a dummy ratings matrix. Each row represents a user and each column represents an item (e.g. movie).

\begin{bmatrix} * & 4.5 & 2.0 & *\\ 4.0 & * & 3.5 & *\\ * & 5.0 & * & 2.0\\ * & 3.5 & 4.0 & 1.0 \end{bmatrix}

$*$ marked entries in the above matrix mean, there are no ratings (score) for the specific user-item pair.