Skip to content

Latest commit

 

History

History
92 lines (77 loc) · 4.58 KB

File metadata and controls

92 lines (77 loc) · 4.58 KB

Dog Breed Classification using InceptionV3 CNN Model on Stanford Dogs Dataset

Description

The Stanford Dogs Dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for the task of fine-grained image categorization. It was originally collected for fine-grain image categorization, a challenging problem as certain dog breeds have near identical features or differ in colour and age.

I have used the InceptionV3 CNN Model, which is pre-trained on the ImageNet dataset for classification. Data augementation has been used for making the model generalize better and also to avoid overfitting. The model achieved an accuracy of 80% on validation set, which is decent for this dataset.

Getting Started

The inceptionV3-for-stanford-dogs-dataset.ipynb notebook can be directly run on Kaggle after loading the dataset in the Kaggle Kernel. Use Kaggle's Nvidia Tesla P100 GPU for faster training and evaluation.

Pre-Requisites

For running the notebook on your local machine, following pre-requisites must be satisfied:

  • NumPy
  • Pandas
  • Scikit-image
  • IPython
  • Matplotlib
  • Tensorflow 2.X
  • Keras

Installation

Dependencies:

# With Tensorflow CPU
pip install -r requirements.txt

# With Tensorflow GPU
pip install -r requirements-gpu.txt

Nvidia Driver (For GPU, if you haven't set it up already):

# Ubuntu 20.04
sudo apt-add-repository -r ppa:graphics-drivers/ppa
sudo apt install nvidia-driver-430

# Windows/Other
https://www.nvidia.com/Download/index.aspx

Dataset

Contents of the dataset:

  • Number of categories: 120
  • Number of images: 20,580
  • Annotations: Class labels, Bounding boxes

The dataset can be downloaded from here.

Sample images of 50 different categories from the dataset:

Images of Dogs

Approach

Data Augmentation

Data augmentation is done through the following techniques:

  • Rescaling (1./255)
  • Shear Transformation (0.2)
  • Zoom (0.2)
  • Horizontal Flipping
  • Rotation (20)
  • Width Shifting (0.2)
  • Height Shifting (0.2)

Augmented Image

Model Details

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
inception_v3 (Functional)    (None, 5, 5, 2048)        21802784  
_________________________________________________________________
global_average_pooling2d (Gl (None, 2048)              0         
_________________________________________________________________
dropout (Dropout)            (None, 2048)              0         
_________________________________________________________________
dense (Dense)                (None, 120)               245880    
=================================================================
Total params: 22,048,664
Trainable params: 245,880
Non-trainable params: 21,802,784
_________________________________________________________________

A detailed layout of the model is available here.

Training Results

Model Accuracy and Loss

The training_csv.log file contains epoch wise training details.

References

  • The original data source is found on http://vision.stanford.edu/aditya86/ImageNetDogs/ and contains additional information on the train/test splits and baseline results.
  • Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao and Li Fei-Fei. Novel dataset for Fine-Grained Image Categorization. First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. [pdf] [poster] [BibTex]
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009. [pdf] [BibTex]
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna. Rethinking the Inception Architecture for Computer Vision, arXiv:1512.00567v3, 2015. [pdf]