Welcome to the second part of the Fifth Episode of Fastdotai where we will deal with** Collaborative Filtering from Scratch — **A technique widely used in Recommendation System. Before we start , I would like to thank **Jeremy Howard** and **Rachel Thomas** for their efforts to democratize AI.

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

- Dog Vs Cat Image Classification
- Dog Breed Image Classification
- Multi-label Image Classification
- Time Series Analysis using Neural Network
- NLP- Sentiment Analysis on IMDB Movie Dataset
- Basic of Movie Recommendation System
- Collaborative Filtering from Scratch
- Collaborative Filtering using Neural Network
- Writing Philosophy like Nietzsche
- Performance of Different Neural Network on Cifar-10 dataset
- ML Model to detect the biggest object in an image Part-1
- ML Model to detect the biggest object in an image Part-2

**Reason behind Netflix and Chill.**

First of all, lets import all the required packages.

%reload_ext autoreload

%autoreload 2

%matplotlib inline

from fastai.learner import *

from fastai.column_data import *

Set the path where

- Input data is stored.
- Temporary files will be stored. (Optional- To be used in kaggle kernels)
- Model weights will be stored. (Optional- To be used in kaggle kernels)

path='../input/'

tmp_path='/kaggle/working/tmp/'

models_path='/kaggle/working/models/'

- Reading of the Data.

ratings = pd.read_csv(path+'ratings.csv')

ratings.head()

# This contains the userid , the movie that the userid watched , the time that movie has been watched , the ratings that has provided by the user .

movies = pd.read_csv(path+'movies.csv')

movies.head()

# This table is just for information purpose and not intended for # modelling purpose

u_uniq = ratings.userId.unique()

user2idx = {o:i for i,o in enumerate(u_uniq)}

# Take every unique user id and map it to a contiguous user .

ratings.userId = ratings.userId.apply(lambda x: user2idx[x])

# Replace that userid with contiguous number.

# Similarly, we do it for the movies.

m_uniq = ratings.movieId.unique()

movie2idx = {o:i for i,o in enumerate(m_uniq)}

ratings.movieId = ratings.movieId.apply(lambda x: movie2idx[x])

Converting movieId and userId into contiguous integers helps us in deciding the embedding matrix. The value of these userId and movieID aren’t contiguous in the beginning . It may start with 1million and won’t be contiguous. So if we use these values for deciding our embedding matrices , then the size of embedding matrices will be too large which might lead to slow processing or overfitting.

class EmbeddingDot(nn.Module):

def __init__(self, n_users, n_movies):

super().__init__()

self.u = nn.Embedding(n_users, n_factors)

self.m = nn.Embedding(n_movies, n_factors)

self.u.weight.data.uniform_(0,0.05)

self.m.weight.data.uniform_(0,0.05)

def forward(self, cats, conts):

users,movies = cats[:,0],cats[:,1]

u,m = self.u(users),self.m(movies)

return (u*m).sum(1).view(-1, 1)

model = EmbeddingDot(n_users, n_movies).cuda() # Class Instantiation

Concept of OOPs is involved in the above code. So let me explain in detail .

`self`

is a reference variable which stores the object (i.e model) when its created.`def __init__(self, n_users, n_movies):`

is a magical function . It’s called automatically whenever object is created for the class. This type of function is known as constructors.`model = EmbeddingDot(n_users, n_movies).cuda()`

. Here the object is created . And with its creation , the constructor is called automatically.- But what’s an Object . An Object (i.e model) is an entity with some attributes and behavior.
- These behavior are the shape and values of the embedding as shown below.

self.u = nn.Embedding(n_users, n_factors) # User Embeddings

self.m = nn.Embedding(n_movies, n_factors) # Movie Embeddings

self.u.weight.data.uniform_(0,0.05) # Values for User Embeddings

self.m.weight.data.uniform_(0,0.05) # Values for Movie Embeddings

- To get the values of these embeddings we use
`nn.Embedding`

which has been inherited from`nn.Module`

using the OOP’s concept of`Inheritance`

using this line of Code :-`super().__init__()`

. `self.u`

is set as an instance of Embedding Class. It has a`.weight`

attribute which contains the actual Embedding matrix. The embedding matrix is a variable. A variable is same as a Tensor and it does automatic differentiation.- To get access to the Tensor use,
`self.u.weight.data`

attribute. `self.u.weight.data.uniform_`

:- The underscore symbol at the end denotes its an inplace operation . The`self.u.weight.data.uniform_`

denotes a uniform random number of an appropriate size for this tensor and don’t return it but fill in the matrix in place.- The forward function comes into action when we do a fit which comes later on . But let’s get into the details of what happens when the forward function is called upon.

def forward(self, cats, conts):

users,movies = cats[:,0],cats[:,1]

u,m = self.u(users),self.m(movies)

return (u*m).sum(1).view(-1, 1)

`users,movies = cats[:,0],cats[:,1]`

:- Grab a minibatch of users and movies .`u,m = self.u(users),self.m(movies)`

:- For that mini-batch of users and movies , look up into the Embedding matrix of users and movies using`self.u(users),self.m(movies)`

.- After getting the embeddings for users and movies we are doing a cross product of those two to get a single number which is the predicted ratings.

x = ratings.drop(['rating', 'timestamp'],axis=1)

# The x contain movies and users from the dataframe. Independent # variables.

y = ratings['rating'].astype(np.float32)

# The y contains the dependent Variable i.e the ratings.

data = ColumnarModelData.from_data_frame(path, val_idxs, x, y, ['userId', 'movieId'], 64)

1# path :- path of the file.

2# val_idxs :- Validation data

3# x, y :- Described above as independent and dependent variable.

4# ['userId', 'movieId'] :- List of categorical variables.

5# 64 :- batch size.

wd=1e-5 # Regularization parameter

opt = optim.SGD(model.parameters(), 1e-1, weight_decay=wd, momentum=0.9)

# Optimizer to be used to update the weights or model.parameters().

# model.parameters() is derived from nn.Module which gives list of all # the weights that are needed to be updated and hence passed to optimizer # along with learning rate, weight decay and momentum.

For fitting of our data i.e for training , earlier we were using `learner`

which is a part of fast.ai but now we will make use of PyTorch capabilities. When the below `fit `

command is executed , checkout `model.py `

file within fastai folder to know the underlying of fit command. Basically what it does is:-

- A forward pass by calling the forward function
`def forward(self, cats, conts):`

- And a backward pass to update the Embedding which is a PyTorch functionality.

`fit(model, data, 3, opt, F.mse_loss)`

Here we won’t get the functionalities of SGDR , hence manually reset the learning rate and check out the loss.

`set_lrs(opt, 0.01)`

`fit(model, data, 3, opt, F.mse_loss)`

Although our model is performing good but since we aren’t implementing SGDR properly , hence our loss is higher as compared to earlier.

**HOW TO FURTHER IMPROVE THE MODEL??**

Now we will be taking bias into consideration . There would be some users who would be highly enthusiastic and would rate all the movies higher on an average . So for this reason we would add a constant for movie and user. This constant is known as bias.

min_rating,max_rating = ratings.rating.min(),ratings.rating.max()

min_rating,max_rating

def get_emb(ni,nf):

# Input is #User,#Factors i.e Embedding Dimensionality

e = nn.Embedding(ni, nf) # Creation of Embedding matrix

e.weight.data.uniform_(-0.01,0.01)

# Fill it with randomly initialized values between (-0.01,0.01)

return e

class EmbeddingDotBias(nn.Module):

def __init__(self, n_users, n_movies):

super().__init__()

# Creating an embedding for User (self.u) , Movies (self.m),

# User bias (self.ub), Movie bias (self.mb) by calling get_emb().

(self.u, self.m, self.ub, self.mb) = [get_emb(*o) for o in [

(n_users, n_factors), (n_movies, n_factors), (n_users,1), (n_movies,1)

]]

def forward(self, cats, conts):

users,movies = cats[:,0],cats[:,1]

um = (self.u(users)* self.m(movies)).sum(1)

res = um + self.ub(users).squeeze() + self.mb(movies).squeeze()

# Add in user bias and movie bias. Using .squeeze() does a broadcasting.

res = F.sigmoid(res) * (max_rating-min_rating) + min_rating

# This is gonna squish the value between 1 and 5 . What it does is if its # a good movie then it will get a really high number else a low number.

# F.sigmoid(res) is gonna squish it between 0 and 1.

return res.view(-1, 1)

wd=2e-4

model = EmbeddingDotBias(cf.n_users, cf.n_items).cuda()

opt = optim.SGD(model.parameters(), 1e-1, weight_decay=wd, momentum=0.9)

fit(model, data, 3, opt, F.mse_loss)

set_lrs(opt, 1e-2)

fit(model, data, 3, opt, F.mse_loss)

Finally we reach a Loss of 0.8 and that’s reasonably good.

*If you like it , then **ABC* *(**Always be clapping** . **👏 👏👏👏👏*😃😃😃😃😃😃😃😃😃*👏 👏👏👏👏👏**)*

If you have any questions, feel free to reach out on the fast.ai forums or on Twitter:@ashiskumarpanda

*P.S. -This blog post will be updated and improved as I further continue with other lessons. For more interesting stuff , Feel free to checkout my **Github** account.*

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

- Dog Vs Cat Image Classification
- Dog Breed Image Classification
- Multi-label Image Classification
- Time Series Analysis using Neural Network
- NLP- Sentiment Analysis on IMDB Movie Dataset
- Basic of Movie Recommendation System
- Collaborative Filtering from Scratch
- Collaborative Filtering using Neural Network
- Writing Philosophy like Nietzsche
- Performance of Different Neural Network on Cifar-10 dataset
- ML Model to detect the biggest object in an image Part-1
- ML Model to detect the biggest object in an image Part-2

**Edit 1:- TFW Jeremy Howard approves of your post .** 💖💖 🙌🙌🙌 💖💖 .