Movie Recommendation: Collaborative Filtering using Neural Network

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

  1. Dog Vs Cat Image Classification
  2. Dog Breed Image Classification
  3. Multi-label Image Classification
  4. Time Series Analysis using Neural Network
  5. NLP- Sentiment Analysis on IMDB Movie Dataset
  6. Basic of Movie Recommendation System
  7. Collaborative Filtering from Scratch
  8. Collaborative Filtering using Neural Network
  9. Writing Philosophy like Nietzsche
  10. Performance of Different Neural Network on Cifar-10 dataset
  11. ML Model to detect the biggest object in an image Part-1
  12. ML Model to detect the biggest object in an image Part-2

Welcome to the Third Part of the Fifth Episode of Fastdotai where we will deal with Collaborative Filtering using Neural Network — A technique widely used in Recommendation System. Before we start , I would like to thank Jeremy Howard and Rachel Thomas for their efforts to democratize AI.

Brace yourself because its Showtime.

First of all, lets import all the required packages.

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.learner import *
from fastai.column_data import *

Set the path where

  • Input data is stored.
  • Temporary files will be stored. (Optional- To be used in kaggle kernels)
  • Model weights will be stored. (Optional- To be used in kaggle kernels)
path='../input/'
tmp_path='/kaggle/working/tmp/'
models_path='/kaggle/working/models/'
  • Reading of the Data.
ratings = pd.read_csv(path+'ratings.csv')
ratings.head()
# This contains the userid , the movie that the userid watched , the time that movie has been watched , the ratings that has provided by the user .
movies = pd.read_csv(path+'movies.csv')
movies.head()
# This table is just for information purpose and not intended for # modelling purpose
u_uniq = ratings.userId.unique() 
user2idx = {o:i for i,o in enumerate(u_uniq)}
# Take every unique user id and map it to a contiguous user .
ratings.userId = ratings.userId.apply(lambda x: user2idx[x])
# Replace that userid with contiguous number.
# Similarly, we do it for the movies. 
m_uniq = ratings.movieId.unique()
movie2idx = {o:i for i,o in enumerate(m_uniq)}
ratings.movieId = ratings.movieId.apply(lambda x: movie2idx[x])

In our earlier approach we took a movie embedding and a user embedding and did a cross product to produce a single number which is our predicted rating.

The same thing can be done via Neural Network approach . In the Neural Network approach , we will take both the embedding , concatenate them and feed it into a neural network to produce a single number i.e the Rating.

The flow chart is as shown below:-

Step 1:- Pick up an Embedding Vector from User Embedding Matrix for a particular user and an Embedding Vector from Movie Embedding Matrix for a particular movie.

Step 2 :- Concatenate both of the vector . Do the same for all the movies and Users, that we want .It will lead to a Embedding matrix where each row is made up of concatenation of Movie and User Embedding vectors .Number of rows will be the same as Number of Users and movies combination. And the column is decided by the number of factors. Same as Embedding dimensionality , that we discussed before. So the dimensions that we have now is (#(Users+Movies)combination, #Factors)

Step 3 :- Feed it to a 1st Linear Layer where the dimension of the Linear Layer is (#Factors,10). The output will be (#(Users+Movies)combination,10).

Step 4:- Pass it through a Second Linear Layer with ReLU activation having a dimension of (10,1). The output will be (#(Users+Movies)combination,1).

Step 5:- This is the output that we wanted. Earlier what we did is for each User and Movie , we did a cross product of their Embeddings which gave us a single number i.e Predicted Rating. Here we are using Neural Network to get that single number i.e Predicted Rating.

Step 6:- We will then try to minimize the Loss between Predicted Rating and Actual Rating , and in turn will update the model parameters (Embedding values). In this way we can make our predictions more accurate.

Let’s checkout the hands-on part.

val_idxs = get_cv_idxs(len(ratings))
n_factors = 10
min_rating,max_rating = ratings.rating.min(),ratings.rating.max()
min_rating,max_rating
class EmbeddingNet(nn.Module):
def __init__(self, n_users, n_movies, nh=10, p1=0.05, p2=0.5):
super().__init__()
(self.u, self.m) = [get_emb(*o) for o in [
(n_users, n_factors), (n_movies, n_factors)]]
# Getting the Embedding matrix for users and movies check out the output # below to know the Embedding dimensionality of User and Movies.

self.lin1 = nn.Linear(n_factors*2, nh)
# The 1st Linear Layer dimensions is (100,10).
        self.lin2 = nn.Linear(nh, 1)
# The 2nd Linear Layer dimensions is (10,1).
self.drop1 = nn.Dropout(p1)
self.drop2 = nn.Dropout(p2)
# Some drop-outs introduced in both the layer.
        
def forward(self, cats, conts):
users,movies = cats[:,0],cats[:,1]
x = self.drop1(torch.cat([self.u(users),self.m(movies)], dim=1))
x = self.drop2(F.relu(self.lin1(x)))
return F.sigmoid(self.lin2(x)) * (max_rating-min_rating+1) + min_rating-0.5
wd=1e-5
model = EmbeddingNet(n_users, n_movies).cuda()
opt = optim.Adam(model.parameters(), 1e-3, weight_decay=wd)
model

Lets train our model.

fit(model, data, 3, opt, F.mse_loss)

Finally, we optimized our loss function and got a loss of 0.78 and that’s reasonably better than our previous approaches as discussed in 5.1 and 5.2.

If you are interested in the code , check out my Github repositories.

If you like it , then ABC (Always be clapping . 👏 👏👏👏👏😃😃😃😃😃😃😃😃😃👏 👏👏👏👏👏)

If you have any questions, feel free to reach out on the fast.ai forums or on Twitter:@ashiskumarpanda

P.S. -This blog post will be updated and improved as I further continue with other lessons. For more interesting stuff , Feel free to checkout my Github account.

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

  1. Dog Vs Cat Image Classification
  2. Dog Breed Image Classification
  3. Multi-label Image Classification
  4. Time Series Analysis using Neural Network
  5. NLP- Sentiment Analysis on IMDB Movie Dataset
  6. Basic of Movie Recommendation System
  7. Collaborative Filtering from Scratch
  8. Collaborative Filtering using Neural Network
  9. Writing Philosophy like Nietzsche
  10. Performance of Different Neural Network on Cifar-10 dataset
  11. ML Model to detect the biggest object in an image Part-1
  12. ML Model to detect the biggest object in an image Part-2

Edit 1:- TFW Jeremy Howard approves of your post . 💖💖 🙌🙌🙌 💖💖 .

Leave a comment

Your email address will not be published. Required fields are marked *