Welcome to the Fifth Episode of Fastdotai where we will deal with** Movie Recommendation System**. Before we start , I would like to thank **Jeremy Howard** and **Rachel Thomas** for their efforts to democratize AI.

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

- Dog Vs Cat Image Classification
- Dog Breed Image Classification
- Multi-label Image Classification
- Time Series Analysis using Neural Network
- NLP- Sentiment Analysis on IMDB Movie Dataset
- Basic of Movie Recommendation System
- Collaborative Filtering from Scratch
- Collaborative Filtering using Neural Network
- Writing Philosophy like Nietzsche
- Performance of Different Neural Network on Cifar-10 dataset
- ML Model to detect the biggest object in an image Part-1
- ML Model to detect the biggest object in an image Part-2

Grab a popcorn and lets get started.

First of all, lets import all the required packages.

%reload_ext autoreload

%autoreload 2

%matplotlib inline

from fastai.learner import *

from fastai.column_data import *

Set the path where

- Input data is stored.
- Temporary files will be stored. (Optional- To be used in kaggle kernels)
- Model weights will be stored. (Optional- To be used in kaggle kernels)

path='../input/'

tmp_path='/kaggle/working/tmp/'

models_path='/kaggle/working/models/'

- Reading of the Data.

ratings = pd.read_csv(path+'ratings.csv')

ratings.head()

# This contains the userid , the movie that the userid watched , the time that movie has been watched , the ratings that has provided by the user .

movies = pd.read_csv(path+'movies.csv')

movies.head()

# This table is just for information purpose and not intended for # modelling purpose

**CREATING A CROSSTAB OF TOP MOVIES AND USERID :-**

g=ratings.groupby('userId')['rating'].count()

topUsers=g.sort_values(ascending=False)[:15]

# Users who have given most number of ratings to the movies are # considered as top Users.

g=ratings.groupby('movieId')['rating'].count()

topMovies=g.sort_values(ascending=False)[:15]

# Movies that have got most number of ratings are topMovies .

top_r = ratings.join(topUsers, rsuffix='_r', how='inner', on='userId')

top_r = top_r.join(topMovies, rsuffix='_r', how='inner', on='movieId')

pd.crosstab(top_r.userId, top_r.movieId, top_r.rating, aggfunc=np.sum)

So we will go through three ways of dealing with the Movie Recommendation .

First of all we will dive into the matrix factorization approach :-

**MATRIX FACTORIZATION:-**

The table in the left box has the actual ratings . Its our actual data.

Let me discuss in detail how the right table is made up of and what’s the relation between Left table and Right table .

**MAKING UP OF PREDICTED RATINGS TABLE (RIGHT TABLE)**

- The right Table has User id (users) as the rows and Movie id (movies) as columns . The Movie id and User id is described in terms of Embedding matrix . Remember Embedding Matrix that we discussed in the last blog post . So as we know an Embedding Matrix is made up of Embedding vectors which are , at the beginning , just random numbers . Represented in purple , in the diagram above.
- Take for e.g User Id 14 is represented by four random numbers. Similarly Movie Id 27 is represented by 4 random numbers . The summation of the product of these numbers give rise to the predicted ratings . Every number of this Embedding vectors are initialized randomly at the beginning. In other words every predicted rating is the matrix product of two embedding vectors.
- Our objective function is to minimize the RMSE between the predicted ratings and actual ratings . If we see the formulae below

**SUMXMY2 function**calculates the sum of the squares of the differences between corresponding items in the arrays and returns the sum of the results. To break the formulae down , it takes the MSE(Mean Squared Error) between predicted and actual values , and then sums it up , which gives rise to a single number . This number is then divided by the count of number of ratings. And then we take the square root of that number. In the figure above , this number has been denoted in blue . And that’s our Objective function to be minimized.

*??? ANY QUESTIONS ???*

**What does these Embedding vectors mean?**

- Initially these are random , but after training , it starts making sense . Check for these rating values , after a couple of epochs. These values keep updating themselves. So after couple of epochs these predicted rating values would be close to the Actual rating values . And according to that these embedding vectors would have adjusted themselves. For e.g movie Id 27 (Lord of the rings):- The Embedding vector consisting of 4 numbers as shown below:-

- Say each cell denotes (%Sci-fi,%CGI based, % dialogue driven, %Modern, %Comedy ). It denotes the genre of the movies.

- Similarly Each number in case of User Id Embedding vector denotes how much User Id 14 likes Scifi movies , modern CGI movies, Dialogue driven movies and so on.
- We will discuss about the bias later on.

**NOTE:- Here we don’t have any non-linear activation function or any kind of hidden layer . Hence it would be considered as an example of Shallow Learning.**

**Qs:- How Collaborative Filtering is same as Probabilistic Matrix Factorization?**

Here we are getting the predicted results as a cross-product of two different vectors. The problem is that we don’t have proper information about each user or movie , so we are assuming this is the reasonable way of understanding the system and use SGD to find the optimized numbers that will work.

**Qs:- How to decide the length of these Embedding vectors?**

We should choose an embedding dimensionality which is enough to represent the true complexity of the problem at hand . At the same time it should not be so big that it would have too many parameters and take too long to run or would produce overfitting results even with regularization.

**Qs:- What does negative number denotes in an embedding vector?**

Negative number in case of movie id denotes that a particular movie doesn’t belong to that particular Genre. Negative number in case of User ID denotes that a particular User doesn’t like that particular genre of movie.

**Qs:- What happens when we have a new Movie or new User?**

In case we use Netflix as a new user , it always asks about what movies do we like . And it retrains it model so as to give good recommendations.

**TIME FOR SOME HANDS ON COLLABORATIVE FILTERING :-**

**Collaborative filtering Recommendation system** approach is a concept of user and item . Suppose there is a User Id -14 who likes Movie Id- 24 , then collaborative filtering approach says , which other Users liked that movie -24 , that User ID-14 liked . Then it goes through the list of movies that other users who shared the same preference as User Id-14 , and recommends those movies to UserId-14.

So it has two parts :-

- UserId and MovieId
- Rating Values(The Dependent Variable)

val_idxs = get_cv_idxs(len(ratings))

wd=2e-4 # L2 Regularization , helps in preventing overfitting

n_factors = 50 # Embedding dimensionalities

cf = CollabFilterDataset.from_csv(path, 'ratings.csv', 'userId', 'movieId', 'rating')

# 1. path - where the file is stored.

# 2. 'ratings.csv' - The excel file which contains the data to be read.

# 3. 'userId' - What should be the rows .

# 4. 'movieId' - What should be the columns .

# 5. 'rating' - Values for predictions.

learn = cf.get_learner(n_factors, val_idxs, 64, opt_fn=optim.Adam, tmp_name=tmp_path, models_name=models_path)

# Finally, we train our model

learn.fit(1e-2, 2, wds=wd, cycle_len=1, cycle_mult=2)

math.sqrt(0.766)

# 0.8752142594816426

# Let's compare to some benchmarks. Here's some benchmarks on the same # dataset for the popular Librec system for collaborative filtering. They # show best results based on RMSE of 0.91. We'll need to take the square # root of our loss, since we use plain MSE.

preds = learn.predict()

**LETS ANALYZE THE RESULTS:**

movie_names = movies.set_index('movieId')['title'].to_dict()

# Contains movieid and their title in form of dictionaries

g=ratings.groupby('movieId')['rating'].count()

# Which movie got how many ratings

topMovies=g.sort_values(ascending=False).index.values[:3000]

# Take the movieid of 3000 movies which has got most number of ratings.

topMovieIdx = np.array([cf.item2idx[o] for o in topMovies])

# Replace the movieid with contigious ids.

# Check out our model below. It has 50 embedding vectors for each of movies and users . And a bias for each movie and each user.

# First, we'll look at the movie bias term. Here, our input is the movie # id (a single id), and the output is the movie bias (a single float).

movie_bias = to_np(m.ib(V(topMovieIdx)))

- The code
`to_np(m.ib(V(topMovieIdx)))`

is going to give a Variable after going through each of the MovieIds in the embedding layer and returning its bias. `m.ib`

refers to embedding layer for an item/movie, which is the bias layer. As we know there are 9066 movies and a bias associated with it.`m.ib`

would return the value of that layer.- model / layers require Variables to keep track of gradients , hence
`V(…)`

. - To convert a tensor into numpy use
`to_np()`

. - To move a model from GPU to CPU for inference purpose, use
`m.cpu()`

. And to move it to GPU use`m.cuda()`

.

movie_ratings = [(b[0], movie_names[i]) for i,b in zip(topMovies,movie_bias)]

# A list comprehension where movie_bias is stored in b and topMovies in # movie_names. Check out the below output which returns a list of tuples # having movies and its bias .

Sort the movies by its bias (i.e the 0th element of each tuple by using lambda function) .On inspection we find that, the bias denotes quality of the movie . Good movies have positive bias and bad movies have negative. This is how to interpret the bias terms.

sorted(movie_ratings, key=lambda o: o[0], reverse=True)[:15]

# Sort the movies by its bias (i.e the 0th element of each tuple by using # lambda function). Reverse=True means in descending order of Bias # values.

**LET’S INTERPRET THE EMBEDDING VECTORS:-**

movie_emb = to_np(m.i(V(topMovieIdx)))

# m.i(...) for item embeddings.

movie_emb.shape

# Because it's hard to interpret 50 embeddings, we use PCA to simplify # them down to just 3 vectors.

from sklearn.decomposition import PCA

pca = PCA(n_components=3)

movie_pca = pca.fit(movie_emb.T).components_

movie_pca.shape

fac0 = movie_pca[0]

movie_comp = [(f, movie_names[i]) for f,i in zip(fac0, topMovies)]

# Here's the 1st component. It seems to be 'easy watching' vs 'serious'.

# Its upto us to decide what does these Embeddings mean . Check the output below

sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

# Lets interpret the 2nd component

fac1 = movie_pca[1]

movie_comp = [(f, movie_names[i]) for f,i in zip(fac1, topMovies)]

# It seems to be 'CGI' vs 'dialog driven'.

sorted(movie_comp, key=itemgetter(0), reverse=True)[:10]

This is how we analyze a Movie Recommendation System. The evaluation criteria is to minimize RMSE. Earlier the error benchmark was **0.91** . Using fastai we arrive at **0.87.**

Hence this model is performing good.

In the next part , we will deal with Collaborative Filtering from Scratch.

*If you like it , then **ABC* *(**Always be clapping** . **👏 👏👏👏👏*😃😃😃😃😃😃😃😃😃*👏 👏👏👏👏👏**)*

If you have any questions, feel free to reach out on the fast.ai forums or on Twitter:@ashiskumarpanda

*P.S. -This blog post will be updated and improved as I further continue with other lessons. For more interesting stuff , Feel free to checkout my **Github** account.*

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

- Dog Vs Cat Image Classification
- Dog Breed Image Classification
- Multi-label Image Classification
- Time Series Analysis using Neural Network
- NLP- Sentiment Analysis on IMDB Movie Dataset
- Basic of Movie Recommendation System
- Collaborative Filtering from Scratch
- Collaborative Filtering using Neural Network
- Writing Philosophy like Nietzsche
- Performance of Different Neural Network on Cifar-10 dataset
- ML Model to detect the biggest object in an image Part-1
- ML Model to detect the biggest object in an image Part-2

**Edit 1:- TFW Jeremy Howard approves of your post .** 💖💖 🙌🙌🙌 💖💖 .