Introduction

The knowledge distilling of shufflenet_v2 and resnet model on mini-imagenet dataset.

Quick Start

dataset

Download mini-imagenet, put it in ../data/ directory and unzip it.

install

git clone https://github.com/wangziren1/knowledge_distill_on_mini-imagenet.git
cd knowledge_distill_on_mini-imagenet
pip install -r requirements.txt

train

train single model:

python3 main.py --arch shufflenet_v2_x1_0 --workers 8 --epochs 100 --batch 64 --warmup-epochs 1 --name shufflenet_v2_x1_0 ../data/mini-imagenet/

--name: save directory

knownledge distill from big model to small model:

python3 kd.py --arch shufflenet_v2_x1_0 --big shufflenet_v2_x2_0 --workers 8 --epochs 100 --batch 64 --warmup-epochs 1 --name kd_T3_alpha0.9 --T 3 --alpha 0.9 ../data/mini-imagenet/

--arch: small model
--big: big model
--name: save directory
--T: temperature
--alpha: the weight of softloss, totalloss = alpha*softloss + (1-alpha)*hardloss

Results

tricks

We do some tricks experiments on shufflenet_v2_x1_0 model.

tricks	accuracy
steplr	76.71
cosinlr	77.72
cosinlr + warmup 1 epoch	78.84
cosinlr + warmup 3 epoch	77.76

In the next experiments, we will use the "cosinlr + warmup 1 epoch" setting.

shufflenet

shufflenet_v2_x1_0	shufflenet_v2_x1_5	shufflenet_v2_x2_0
78.84	79.1	80.83

Distill the knowledge in shufflenet_v2_x2_0 to shufflenet_v2_x1_0. T represents temperature and alpha represents the weight of softloss(totalloss = alpha*softloss + (1-alpha)*hardloss).

alpha\T	1	3	5	10
0.9	79.64	81.10	80.65	80.875
0.7		81.27
0.6		81.28
0.5		80.78

When T = 3 and alpha = 0.6, we get the hightest accuracy. In the resnet experiment, we will use this setting.

resnet

resnet18	resnet34	resnet50
79.11	81	81.79

Distill the knowledge in resnet50 to resnet18

alpha\T	3
0.6	82.45

Some findings

The knowledge distilling can improve accurary of small model compared to that of small model and even big model.
The knowledge distilling can increase generalization: the decrease of the gap between train accuracy and val accuracy. For example, the train accuracy and the val accuracy of shufflenet_x1 are 86.46 and 78.84 which differs by 7.62, while the train accuracy and the val accuracy of distilled shufflenet_x1 are 87.28 and 81.28 which differs by 6. The train accuracy and the val accuracy of resnet18 are 93.28 and 79.11 which differs by 14.17, while the train accuracy and the val accuracy of distilled resnet18 are 92.53 and 82.45 which differs by 10.08.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
kd.py		kd.py
loss.py		loss.py
main.py		main.py
my_dataset.py		my_dataset.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Quick Start

dataset

install

train

Results

tricks

shufflenet

resnet

Some findings

Reference

About

Releases

Packages

Languages

License

wangziren1/knowledge_distill_on_mini-imagenet

Folders and files

Latest commit

History

Repository files navigation

Introduction

Quick Start

dataset

install

train

Results

tricks

shufflenet

resnet

Some findings

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages