Photo by Chewy on Unsplash

Dogs vs Cats Image Classifier using Deep Learning

I have been going through fast.ai for a couple of months . I got to admit that there were lots of stuffs and awesome techniques that I learned in the process. I’ll make sure to update all those in my Blog . All thanks to Jeremy Howard and Rachel Thomas for their efforts to democratize AI. Thanks to the awesome fast.ai community for all the quick help .

This below picture depicts my journey till now which makes it an interesting one.

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

  1. Dog Vs Cat Image Classification
  2. Dog Breed Image Classification
  3. Multi-label Image Classification
  4. Time Series Analysis using Neural Network
  5. NLP- Sentiment Analysis on IMDB Movie Dataset
  6. Basic of Movie Recommendation System
  7. Collaborative Filtering from Scratch
  8. Collaborative Filtering using Neural Network
  9. Writing Philosophy like Nietzsche
  10. Performance of Different Neural Network on Cifar-10 dataset
  11. ML Model to detect the biggest object in an image Part-1
  12. ML Model to detect the biggest object in an image Part-2

So Brace yourselves and focus on Part 1 Lesson 2 of Fastai course.

DOG VS CAT IMAGE CLASSIFIER:

Importing the packages and data preparation for the deep learning model to learn.

This blog post deals with Dogs vs Cats Image Classification Model. It has been taught by Jeremy Howard in Part 1 Lesson 2 of FastAI Course.

Import all the libraries as below that will be used in this machine learning Model.

!pip install fastai==0.7.0
!pip install torchtext==0.2.3
!pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl 
!pip3 install torchvision


import fastai
from matplotlib import pyplot as plt

# Put these at the top of every notebook, to get automatic reloading # and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# This file contains all the main external libs we'll use
# from fastai.imports import *

from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *
Check for whether GPU has been enabled using the following commands:-
image classification. deep learning model ,machine learning ,CNN

The output to the above commands should return True.

Before moving forward, I would like to mention couple of Linux Command that might come handy .

Use these commands prefixed by ‘!’ mark .

  1. !ls is a command to list files present in the current directory.
  2. !pwd stands for Print work directory. Will print the path of working directory
  3. !cd stands for change directory.

The above three commands helps to navigate between directories.

The images of dogs and cats has been downloaded using the following command.

<< !wget http://files.fast.ai/data/dogscats.zip>>

The structure of the folder should be as follows:-

image classification. deep learning model ,machine learning ,CNN

Set the path to where the data is stored

PATH = "data/dogscats/"
sz=224

Check whether the files have been downloaded using the following command:-

image classification. deep learning model ,machine learning ,CNN

The f’ above stands for f-Strings .It’s a new way to format Strings in Python. To know more about f-strings, please check this awesome link by realpython.com.

Pre-trained Deep Learning Model for Image Classification

The Classification task will make use of the pre-trained deep learning model. A pre-trained deep learning model is one that has been trained on a similar type of data by someone else. So instead of training the deep learning model from scratch , a model will be used that has been trained on ImageNet. ImageNet is a dataset consisting of 1.2 million images and 1000 classes. ResNet34 is the version of the deep learning Model that will be used . It’s a Special type of Convolutional Neural Network. ResNet34 won the 2015 ImageNet Competition. The details of ResNet will be discussed in the upcoming blog post .

The following lines of Code shows how to train the model using fastai.

image classification. deep learning model ,machine learning ,CNN

The architecture resnet34 which is used has been saved in arch variable . The data is saved in data variable as it looks for the data in the PATH specified earlier . The tfms is a part of data augmentation which will be dealt later in detail.

The pre-trained method creates the new Neural Network from the arch model(resnet34). The fit method trains the model using the learning rate and the number of epochs specified. And an accuracy of 0.9895 has been obtained.

GRADIENT DESCENT

image classification. deep learning model ,machine learning ,CNN,gradient descent

Let me explain the above image. Initially, the parameters that has been chosen were random. The loss was high at this point of time . High Loss indicates that during training, the difference between the ‘outcome / predicted value’ and the ‘ target value/ labels ’ is more. Hence an approach should be followed using which this difference should be made least . Convergence or reaching to the local minima means the loss is minimum at this point and hence the difference between the outcome and target value /labels is the least. This process is known as Gradient Descent.

LEARNING RATE:-

The Learning rate(LR) in the fit function above is one of the most important parameters and should be carefully chosen so as to make the model reach an optimal solution in a fast and efficient way. It basically says how to quickly reach the optimal point in the function .If LR is low the process is slow and if its too high then there is a great chance that it might overshoot the minima. So the LR has to be carefully chosen , so as to make sure the convergence (reaching the local minima) happens in a efficient manner . The image below describes the above concept.

image classification. deep learning model ,machine learning ,CNN,gradient descent

How to Choose the Best Learning Rate ?

!!! Don’t worry , Jeremy Howard has your back. Jeremy has mentioned a wonderful way of finding the Learning rate and its known as

LEARNING RATE FINDER.

Please check the code below.

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder
image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder

Using lr_find() the optimal learning rate can be obtained. As the Learning rate vs iteration graph shows, the LR is being increased after each minibatch and it increases exponentially . In the 2nd plot i.e Loss vs Learning rate , its observed that the Loss decreases for a while as the Learning rate increases and when Learning rate is at 0.1 the loss is at its minimum , post which it starts to rise again (which means the LR is so high that it has overshoot the minima and the loss gets worse).

To choose the best learning rate, here are the following steps:-

  1. Determine the lowest point in the Loss vs Learning rate graph above (i.e at 0.1)
  2. Take a step back by magnitude 1 (i.e at 0.01) and choose that as a Learning Rate.

Concept behind Going back by magnitude 1 :-

Although at this point the Loss is at minimum but the Learning rate chosen at this point is too high and continuing with this learning rate, won’t yield in convergence. Please check the image below for explanation.

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder

NOTE :- Learning rate finder is one of the most important hyperparameter and if adjusted /chosen properly will yield the best results.

IMPROVING THE DEEP LEARNING MODEL

One way to improve the model is by giving it more data . Hence we use data augmentation . Wait , But Why Data Augmentation?

Our model generally has a couple of million of parameters and when trained for more number epoch there is a great probability , it might start overfitting . Overfitting means the model is over learning the specific details of the images in the training dataset and it might not generalize well on the validation dataset or test dataset . In other words , Overfitting is said to happen when the accuracy of the validation dataset is less than the accuracy of the training dataset(or the loss calculated on the training dataset is much less than the loss calculated on the validation dataset). So Overfitting can be avoided by providing more data to the model , hence data augmentation is used.

Note:-Data Augmentation is not creating new data , but allows the Convolution Neural Network to learn how to recognize dogs or cats from a very different angle .

For data augmentation we pass transforms_side_on to aug_tfms in tfms_from_model() function. transforms_side_on will give different versions of image by flipping it horizontally. It makes the NN to see the images , as if it has been taken from side angle, rotate them by small amounts and slightly vary their contrasts , brightness and slightly zoom in a bit , move around a bit. The variations can be seen in the image below.

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder, DATA AUGMENTATION

For data augmentation to take place write the following code

tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_paths(PATH, tfms=tfms)

Although there is room created for data augmentation , yet the data augmentation wont work, because initially it has been set as precompute=True .

Let me explain in detail the following code and its relation with above made statement:-

data = ImageClassifierData.from_paths(PATH, tfms=tfms)
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(1e-2, 1)

When declaring the architecture using ConvLearner.pretrained(…) , the precompute is set as True which says to implement the activations from the pretrained network. A pretrained network is one which has already learnt to recognize certain things. For our Dog vs Cat study , the pretrained network used, has already learned to classify 1000 classes on 1.2 million images in ImageNet Dataset. So take the penultimate layer (as this is the layer which has all the required information necessary to figure out what the image is ) and save these activations. Save these activations for each of the image and these are known as precomputed activations. Now when creating a new classifier , take advantage of these precomputed activation and quickly train a model based on those activations. Hence to implement this set precompute=True .

Note:- When precompute=True, data augmentation doesn’t work as it is currently using a particular version of the augmented cat or in other words even though a different version of cat is being showed each time , the activation for a particular version of cat has been has been precomputed. It takes a minute or two to precompute the activations when it runs for the first time.

When trained using precomputed activations , the accuracy is 98.8%:-

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder

To put data augmentation to work , set precompute=Falseand check for the accuracy. In the code below cycle_len is an important parameter and will be dealt in detail later in this post.

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder

The accuracy has increased a bit to 99.1% and the good news is that , it is not overfitting and the training loss has decreased further . To further improve the model lets focus on:-

SGDR(STOCHASTIC GRADIENT DESCENT WITH RESTARTS)

SGDR says as we get closer and closer to the minima ,lets decrease the learning rate . The idea of decreasing the learning rate as we train (i.e with more number of iterations) is known as Learning Rate Annealing . There is Step Wise and Cosine Annealing . In this Course Jeremy Howard uses Cosine Annealing.

SGDR(STOCHASTIC GRADIENT DESCENT WITH RESTARTS)  image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder
SGDR(STOCHASTIC GRADIENT DESCENT WITH RESTARTS)  image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder

In Cosine Annealing , we train using a higher learning rate when not near minima .When getting close to local minima , switch to a low learning rate and do few iterations on top of this.

The above diagram shows a simple loss function . In real , the datasets are represented in a very high dimensional space and there is lot of fairly flat points and these aren’t local minima. Suppose our surface looks like the below diagram:-

SGDR(STOCHASTIC GRADIENT DESCENT WITH RESTARTS)  image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder

Started at the red point no 1 and reached the global minima as shown by red point no 2 but here it doesn’t generalize very well . If this solution is used, in case of slightly different dataset , it will not lead to a good result. On the other hand red point no 3 , will generalize very well in case of slightly different dataset. Our Standard Learning rate annealing approach will go downhill to one spot and in high dimension there is a great chance of being stuck in a spiky zone , where it will not generalize better , hence not a good solution . Instead a Learning rate scheduler can be deployed which will reset and do a cosine annealing and jump again so that it will jump from point 2 to point 3 and so on , until it reaches a point where the generalization is really good.

Each time the Learning rate is reset it will again increase the Learning rate which will lead to leaving the nasty spiky part of the surface and eventually jumps to a nice smooth bowl which will generalize better.

This above process is known as SGDR(Stochastic Gradient Descent with Restarts). The best part of this SGDR is once a “nice smooth curve like surface” is reached , it wont restart anymore . It actually hangs out in this nice part of the space and then keeps getting better at finding the reasonably good spot . Please check the diagram below.

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder, DATA AUGMENTATION

Using SGDR along with Learning rate finder will give us better results . From learning rate finder try to visually pick up a good learning rate or else in SGDR it wont jump up to a nice smooth like surface . The reset of the learning rate happens with the help of cycle_len parameter . It basically means reset the learning rate after every 1 epoch. The below image shows how the reset happens:-

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder, DATA AUGMENTATION

Note:- Resetting of learning rate happens after every single epoch as cycle_len=1 and Learning rate keeps changing after every single mini batch. The y axis is the learning rate where 0.010 is the learning rate we get from learning rate finder. So SGDR will shuffle the learning rate between 0 and 0.010.

It is advised to keep saving the model at intermediate steps. To do so use the following commands:-

learn.save('224_lastlayer')
learn.load('224_lastlayer')

The model is saved in the models folder within the dogscats folder as shown below:-

image classification. deep learning model ,machine learning ,CNN, gradient descent, learning rate finder, DATA AUGMENTATION

All the precomputed activations are saved in the tmp folder . So in case of weird errors , may be due to half completed precomputed activations or in some other way , go ahead and delete the tmp folder and check if the error has gone away. This is the fastai way of turning it off and on again .

Note:- Precomputed activation are without any training. These are what pretrained models created with the weights we downloaded.

What else can we do to make the model better?

So far the pre-trained activations have been downloaded and directly used. The pre-trained activations or the precomputed weights in the CNN kernels are left untouched (i.e no retraining of precomputed weights has been done yet). The pre-trained model already knows how to find at early stages edges, curves, gradients and then repeating patterns and eventually the main features. Until now, only new layers were being added on the top and models learned how to mix and match the pre-trained features. If a model trained on Imagenet is extended to a case like “Satellite images classification” where the features are completely different, most of the layers are needed to be retrained as the features are completely different. Hence a new concept is needed to be explored named as:-

FINE TUNING AND DIFFERENTIAL LEARNING RATE

To learn a different set of features or to tell the learner that the convolution filters are needed to be changed , simply unfreeze all the layers of neural network. A frozen layer is one whose weights are not trained or updated.

!!! Okay Okay Elsa ,I’ll let it go and unfreeze the layers !!! 😍 😍

Unfreezing the layers will make the layer weights open to training or updating. But the initial layers need little or any training as compared to the later layers of deep learning model . This holds universally true because the work of initial layers is to learn edges and curves while the later layers learns about the important features. Hence the learning rate is set different for different set of layers. This concept is known as Differential Learning Rate .

learn.unfreeze()
lr=np.array([1e-4,1e-3,1e-2])

After making the required changes , train the model as shown below .

FINE TUNING AND DIFFERENTIAL LEARNING RATE

Earlier cycle_len=1 and number_of_cycles=3 parameters were discussed . Just a refresher again, cycle_len=1 is the number of epoch and number_of_cycles=3 means the learner will do 3 cycles each of 1 epoch. Now a new parameter has been introduced named as cycle_mult=2. This cycle_mult parameter multiplies the length of each cycle after each cycle . Here the multiplication factor is 2. Hence (1+2*1 +2*2 )epoch=7 epoch. What this translates to is if the cycle length is too short , it starts going down to find a reasonably good spot and then pops out and again goes down and pops out . It never actually gets to find a good spot , that is both a good minima as well as good at generalizing . Its a matter of chance . Its not exploring the surface. So to explore the surface more set cycle_mult=2. Now the graph looks like more exploring:-

FINE TUNING AND DIFFERENTIAL LEARNING RATE

As observed, till now the accuracy has increased till 99.0% and the losses have drastically decreased a lot . There is one last way to make the model better . Its known as

TEST TIME DATA AUGMENTATION (TTA)

On Validation /Test dataset , all of the inputs are required to be a square . This helps the GPU in fast processing . It won’t process fast if the input images in validation dataset are of different dimensions . To make this consistent it squares out the picture in the middle. As in this following example:-

TEST TIME DATA AUGMENTATION (TTA)

If the picture above is squared out in the center , it will be tough for the model to predict if its a dog or cat as its only the body that gets into the validation dataset. For this (Test Time Augmentation) TTA is being used. It will take four data augmentation at random as well as the unaugmented original center cropped image. Then it takes the average of all the prediction on all these images .That’s our final prediction.

Note:- Applicable to test and validation data set only.

TEST TIME DATA AUGMENTATION (TTA)

As seen above, the accuracy after applying TTA is 99.35% .

CONFUSION MATRIX

To get a summary of our classifier, plot a confusion matrix. A confusion matrix is used in image classification to know how many were correctly or incorrectly predicted as shown in the image below.

Confusion Matrix
CONFUSION MATRIX

Interpretation of the Confusion Matrix in Image Classification task:-

The confusion matrix speaks about how good our image classifier is . As seen above, the dark blue regions has been classified correctly . 996 cat pictures has been classified as cats and 993 dog pictures has been classified as dogs correctly. 7 dog pictures has been classified as cats and 4 cat pictures has been classified as dogs. Hence our Classifier is doing a pretty good job.


Hope you find this post helpful . In my future blog post we will go deep . Because of lots of important concepts covered, You might feel like this now.

!! Hang on , More such interesting stuff coming soon . Until then Goodbye 😉!!

P.S. — In case you are interested checkout the code here .

A B CAlways be clapping . 👏 👏👏👏👏😃😃😃😃😃😃😃😃😃👏 👏👏👏👏 👏

Edit 1:- TFW Jeremy Howard approves of your post . 😃😃😃😃😃😃

To make best out of this blog post Series , feel free to explore the first Part of this Series in the following order:-

  1. Dog Vs Cat Image Classification
  2. Dog Breed Image Classification
  3. Multi-label Image Classification
  4. Time Series Analysis using Neural Network
  5. NLP- Sentiment Analysis on IMDB Movie Dataset
  6. Basic of Movie Recommendation System
  7. Collaborative Filtering from Scratch
  8. Collaborative Filtering using Neural Network
  9. Writing Philosophy like Nietzsche
  10. Performance of Different Neural Network on Cifar-10 dataset
  11. ML Model to detect the biggest object in an image Part-1
  12. ML Model to detect the biggest object in an image Part-2

Leave a comment

Your email address will not be published. Required fields are marked *