Menu Close

Using CNNs to detect Humans and Dog Breeds

In this project, we will build a pipeline that can process real-world, user-supplied images. Given an image of a dog, the algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed.

This project was supplied by Udacity and uses various pre-trained CNN for image classification including VGG16. Datasets are provided by Udacity.

We will explore different models to detect humans and dogs, and classify dog breeds, including models built from scratch and pre-trained image classification models publicly available.

The goal is to explore the accuracy of various models and discover the benefits of using transfer learning.

Lets start by exploring our dataset:

There are 133 total dog categories. 
There are 8351 total dog images.
There are 6680 training dog images.
There are 835 validation dog images.
There are 836 test dog images.
There are 13233 total human images.

Using OpenCV’s implementation of Haar feature-based cascade classifiers we can detect human faces in images

Let us assess the human face detector using our given dataset. We will run the face detector on the first 100 training images in our dataset (100 human faces, and 100 dog faces, but first we will need to grey scale our images:

100 Human faces detected in human face images 
11 Human faces detected in dog face images

We can see that OpenCVs implementation can accurately predict around 100% of human faces, but can detect some dog faces as human faces

Lets try to use the pre-trained ResNet-50 model to detect dogs in images.
Our code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.
(Source Udacity – DSNS)

The steps we will take:

  • Pre-process the data. We will use TensorFlow as a backend, with python-keras CNNs which require a 4D array (# samples, rows, columns, channels)
  • Train our model (using our dog images)
  • Make predictions with our model
  • Test the accuracy of our model

Our algorithm will do the following:

  1. Detect a human face in the image or a dog face in the image
  2. Return a dog breed which most closely matches the image (human or dog)

Using ResNet50 we have tried to make a dog detector to see if it can correctly detect a human or dog. If a dog breed is returned, we can say that the ResNet50 model detected a dog, otherwise it is a human. Our result:

0 Dog faces detected in human face images 
Percentage of Humans detected as dogs: 0.0

100 Dog faces detected in dog face images
Percentage of dogs detected as dogs: 100.0

Good start!

Pre-Processing: Before we can start running CNN models we will convert our data into a sensor, with each colour (RED, BLUE, GREEN) as a different vector in the tensor. We will then convert it to a 4D tensor, with an arbitrary 1 dimensional element in the first dimension. This will result in a final tensor size of (1,255,255,3) Since the images used in our dataset have already been standardized for use in image classification problems, we do not have to worry about abnormal data such as NANs or outliers. How convenient! Now our pre-processing is complete.

Now lets create a CNN to classify Dog Breeds from scratch:

The model we have implemented from scratch

We train our model over 20 epochs, with a batch size of 20: Test accuracy: 4.5455%

We can see that in order to successfully detect dog-breeds if we are using a model made from scratch we would have to do significant training! Lets try leveraging a pre-trained model.

VGG16 Model:

We take a pre-trained VGG16 model and add the following layers:

We add the connecting layers (133 final dense layer) because we have 133 classifications of dog breeds.

Lets train using the same batch size and epoch size:

loss: 6.9934 - acc: 0.5527 - val_loss: 7.8975 - val_acc: 0.4419 

We can see that over the same amount of training we have a much better accuracy on the validation set of 55.3%. When we test our model against the test data we have an accuracy score of 42.8%. A big improvement!

Now we will try another pre-trained model, VGG16 and add the same connecting layers to classify our breeds. We will also increase our epoch size (400) and our batch size (4000).

loss: 2.8463
acc: 0.8229
val_loss: 4.0593
val_acc: 0.6240

Our accuracy on the test set has jumped considerably! 82%. And on our test result we achieve an accuracy of 63.2%. Lets put it all together.

We will take an image, detect a human or dog face, and classify the closet dog breed to the match. Lets try on a sample of 6 images, 3 humans and 3 dogs.

Hello human!  You look like a dog of type ages/train/112.Nova_scotia_duck_tolling_retriever. 

../../../data/lfw/Rick_Dinse/Rick_Dinse_0002.jpg Hello human! You look like a dog of type ages/train/014.Basenji.

../../../data/lfw/JC_Chasez/JC_Chasez_0001.jpg Hello human! You look like a dog of type ages/train/131.Wirehaired_pointing_griffon.

../../../data/lfw/Carlos_Menem/Carlos_Menem_0013.jpg Dog detected of type ages/train/076.Golden_retriever.

../../../data/dog_images/train/095.Kuvasz/Kuvasz_06442.jpg Dog detected of type ages/train/057.Dalmatian.

../../../data/dog_images/train/057.Dalmatian/Dalmatian_04054.jpg Dog detected of type ages/train/088.Irish_water_spaniel.

../../../data/dog_images/train/088.Irish_water_spaniel/Irish_water_spaniel_06014.jpg

From the above output, we can see that we have detected all 3 humans correctly as humans and returned their closest resembling dog breed. We have also correctly identified the dog breed in 2 of 3 images of dogs (66.7%). As expected since our test accuracy was about 63%

Conclusions

We have explored this problem and came across many interesting responses on our performance by exploring the available image recognition models. We can conclude the following:

  1. OpenCV’s implementation of Haar feature-based cascade classifiers is very accurate in detecting human faces. We have seen 100% accuracy with our given dataset.
  2. Building a CNN from scratch for image classification can be challenging and time consuming. In our first attempt, only managed an accuracy of 4.5%. We can continue to increase the batch size, epoch size and training data size to get better results. This would be a blind approach, without knowing exactly what combination will lead to acceptable results.
  3. Using pre-trained image classifications is a highly effective shortcut to our given problem. We can leverage a model already pre-trained for image classification, and provide it with our own image dataset to recalibrate the model and get very good results. With a VGG16 model, epoch size of 400, and batch size of 4000, we were able to achieve 63% accuracy with a model trained in the matter of minutes. We can continue to increase the epoch size, batch size, and provide a larger training set to increase our accuracy.

For details on the code and implementation please see my github repo: https://github.com/neilmistry/dogapp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: