Skip to content

Latest commit

 

History

History
921 lines (896 loc) · 40.7 KB

ModelZoo.md

File metadata and controls

921 lines (896 loc) · 40.7 KB

NNCF Compressed Model Zoo

Ready-to-use Compressed LLMs can be found on OpenVINO Hugging Face page. Each model card includes NNCF parameters that were used to compress the model.

INT8 Post-Training Quantization (PTQ) results for public Vision, NLP and GenAI models can be found on OpenVino Performance Benchmarks page. PTQ results for ONNX models are available in the ONNX section below.

Quantization-Aware Training (QAT) results for PyTorch and TensorFlow public models can be found below.

PyTorch

PyTorch Classification

Model Compression algorithm Dataset Accuracy (drop) % Configuration Checkpoint
GoogLeNet - ImageNet 69.77 Config -
GoogLeNet • Filter pruning: 40%, geometric median criterion ImageNet 69.47 (0.30) Config Download
Inception V3 - ImageNet 77.33 Config -
Inception V3 • QAT: INT8 ImageNet 77.45 (-0.12) Config Download
Inception V3 • QAT: INT8
• Sparsity: 61% (RB)
ImageNet 76.36 (0.97) Config Download
MobileNet V2 - ImageNet 71.87 Config -
MobileNet V2 • QAT: INT8 ImageNet 71.07 (0.80) Config Download
MobileNet V2 • QAT: INT8 (per-tensor only) ImageNet 71.24 (0.63) Config Download
MobileNet V2 • QAT: Mixed, 58.88% INT8 / 41.12% INT4 ImageNet 70.95 (0.92) Config Download
MobileNet V2 • QAT: INT8
• Sparsity: 52% (RB)
ImageNet 71.09 (0.78) Config Download
MobileNet V3 (Small) - ImageNet 67.66 Config -
MobileNet V3 (Small) • QAT: INT8 ImageNet 66.98 (0.68) Config Download
ResNet-18 • Filter pruning: 40%, magnitude criterion ImageNet 69.27 (0.49) Config Download
ResNet-18 • Filter pruning: 40%, geometric median criterion ImageNet 69.31 (0.45) Config Download
ResNet-18 • Accuracy-aware compressed training
• Filter pruning: 60%, geometric median criterion
ImageNet 69.2 (-0.6) Config -
ResNet-34 - ImageNet 73.30 Config -
ResNet-34 • Filter pruning: 50%, geometric median criterion
• Knowledge distillation
ImageNet 73.11 (0.19) Config Download
ResNet-50 - ImageNet 76.15 Config -
ResNet-50 • QAT: INT8 ImageNet 76.46 (-0.31) Config Download
ResNet-50 • QAT: INT8 (per-tensor only) ImageNet 76.39 (-0.24) Config Download
ResNet-50 • QAT: Mixed, 43.12% INT8 / 56.88% INT4 ImageNet 76.05 (0.10) Config Download
ResNet-50 • QAT: INT8
• Sparsity: 61% (RB)
ImageNet 75.42 (0.73) Config Download
ResNet-50 • QAT: INT8
• Sparsity: 50% (RB)
ImageNet 75.50 (0.65) Config Download
ResNet-50 • Filter pruning: 40%, geometric median criterion ImageNet 75.57 (0.58) Config Download
ResNet-50 • Accuracy-aware compressed training
• Filter pruning: 52.5%, geometric median criterion
ImageNet 75.23 (0.93) Config -
SqueezeNet V1.1 - ImageNet 58.19 Config -
SqueezeNet V1.1 • QAT: INT8 ImageNet 58.22 (-0.03) Config Download
SqueezeNet V1.1 • QAT: INT8 (per-tensor only) ImageNet 58.11 (0.08) Config Download
SqueezeNet V1.1 • QAT: Mixed, 52.83% INT8 / 47.17% INT4 ImageNet 57.57 (0.62) Config Download

PyTorch Object Detection

Model Compression algorithm Dataset mAP (drop) % Configuration Checkpoint
SSD300‑MobileNet - VOC12+07 train, VOC07 eval 62.23 Config Download
SSD300‑MobileNet • QAT: INT8
• Sparsity: 70% (Magnitude)
VOC12+07 train, VOC07 eval 62.95 (-0.72) Config Download
SSD300‑VGG‑BN - VOC12+07 train, VOC07 eval 78.28 Config Download
SSD300‑VGG‑BN • QAT: INT8 VOC12+07 train, VOC07 eval 77.81 (0.47) Config Download
SSD300‑VGG‑BN • QAT: INT8
• Sparsity: 70% (Magnitude)
VOC12+07 train, VOC07 eval 77.66 (0.62) Config Download
SSD300‑VGG‑BN • Filter pruning: 40%, geometric median criterion VOC12+07 train, VOC07 eval 78.35 (-0.07) Config Download
SSD512-VGG‑BN - VOC12+07 train, VOC07 eval 80.26 Config Download
SSD512-VGG‑BN • QAT: INT8 VOC12+07 train, VOC07 eval 80.04 (0.22) Config Download
SSD512-VGG‑BN • QAT: INT8
• Sparsity: 70% (Magnitude)
VOC12+07 train, VOC07 eval 79.68 (0.58) Config Download

PyTorch Semantic Segmentation

Model Compression algorithm Dataset mIoU (drop) % Configuration Checkpoint
ICNet - CamVid 67.89 Config Download
ICNet • QAT: INT8 CamVid 67.89 (0.00) Config Download
ICNet • QAT: INT8
• Sparsity: 60% (Magnitude)
CamVid 67.16 (0.73) Config Download
UNet - CamVid 71.95 Config Download
UNet • QAT: INT8 CamVid 71.89 (0.06) Config Download
UNet • QAT: INT8
• Sparsity: 60% (Magnitude)
CamVid 72.46 (-0.51) Config Download
UNet - Mapillary 56.24 Config Download
UNet • QAT: INT8 Mapillary 56.09 (0.15) Config Download
UNet • QAT: INT8
• Sparsity: 60% (Magnitude)
Mapillary 55.69 (0.55) Config Download
UNet • Filter pruning: 25%, geometric median criterion Mapillary 55.64 (0.60) Config Download

PyTorch NLP (HuggingFace Transformers-powered models)

PyTorch Model Compression algorithm Dataset Accuracy (drop) %
BERT-base-cased • QAT: INT8 CoNLL2003 99.18 (-0.01)
BERT-base-cased • QAT: INT8 MRPC 84.8 (-0.24)
BERT-base-chinese • QAT: INT8 XNLI 77.22 (0.46)
BERT-large
(Whole Word Masking)
• QAT: INT8 SQuAD v1.1 F1: 92.68 (0.53)
DistilBERT-base • QAT: INT8 SST-2 90.3 (0.8)
GPT-2 • QAT: INT8 WikiText-2 (raw) perplexity: 20.9 (-1.17)
MobileBERT • QAT: INT8 SQuAD v1.1 F1: 89.4 (0.58)
RoBERTa-large • QAT: INT8 MNLI matched: 89.25 (1.35)

TensorFlow

TensorFlow Classification

Model Compression algorithm Dataset Accuracy (drop) % Configuration Checkpoint
Inception V3 - ImageNet 77.91 Config -
Inception V3 • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) ImageNet 78.39 (-0.48) Config Download
Inception V3 • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations)
• Sparsity: 61% (RB)
ImageNet 77.52 (0.39) Config Download
Inception V3 • Sparsity: 54% (Magnitude) ImageNet 77.86 (0.05) Config Download
MobileNet V2 - ImageNet 71.85 Config -
MobileNet V2 • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) ImageNet 71.63 (0.22) Config Download
MobileNet V2 • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations)
• Sparsity: 52% (RB)
ImageNet 70.94 (0.91) Config Download
MobileNet V2 • Sparsity: 50% (RB) ImageNet 71.34 (0.51) Config Download
MobileNet V2 (TensorFlow Hub MobileNet V2) • Sparsity: 35% (Magnitude) ImageNet 71.87 (-0.02) Config Download
MobileNet V3 (Large) - ImageNet 75.80 Config -
MobileNet V3 (Large) • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) ImageNet 75.04 (0.76) Config Download
MobileNet V3 (Large) • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations)
• Sparsity: 42% (RB)
ImageNet 75.24 (0.56) Config Download
MobileNet V3 (Small) - ImageNet 68.38 Config -
MobileNet V3 (Small) • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) ImageNet 67.79 (0.59) Config Download
MobileNet V3 (Small) • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations)
• Sparsity: 42% (Magnitude)
ImageNet 67.44 (0.94) Config Download
ResNet-50 - ImageNet 75.05 Config -
ResNet-50 • QAT: INT8 ImageNet 74.99 (0.06) Config Download
ResNet-50 • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations)
• Sparsity: 65% (RB)
ImageNet 74.36 (0.69) Config Download
ResNet-50 • Sparsity: 80% (RB) ImageNet 74.38 (0.67) Config Download
ResNet-50 • Filter pruning: 40%, geometric median criterion ImageNet 74.96 (0.09) Config Download
ResNet-50 • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations)
• Filter pruning: 40%, geometric median criterion
ImageNet 75.09 (-0.04) Config Download
ResNet50 • Accuracy-aware compressed training
• Sparsity: 65% (Magnitude)
ImageNet 74.37 (0.67) Config -

TensorFlow Object Detection

Model Compression algorithm Dataset mAP (drop) % Configuration Checkpoint
RetinaNet - COCO 2017 33.43 Config Download
RetinaNet • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) COCO 2017 33.12 (0.31) Config Download
RetinaNet • Sparsity: 50% (Magnitude) COCO 2017 33.10 (0.33) Config Download
RetinaNet • Filter pruning: 40% COCO 2017 32.72 (0.71) Config Download
RetinaNet • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations)
• Filter pruning: 40%
COCO 2017 32.67 (0.76) Config Download
YOLO v4 - COCO 2017 47.07 Config Download
YOLO v4 • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) COCO 2017 46.20 (0.87) Config Download
YOLO v4 • Sparsity: 50% (Magnitude) COCO 2017 46.49 (0.58) Config Download

TensorFlow Instance Segmentation

Model Compression algorithm Dataset mAP (drop) % Configuration Checkpoint
Mask‑R‑CNN - COCO 2017 bbox: 37.33
segm: 33.56
Config Download
Mask‑R‑CNN • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) COCO 2017 bbox: 37.19 (0.14)
segm: 33.54 (0.02)
Config Download
Mask‑R‑CNN • Sparsity: 50% (Magnitude) COCO 2017 bbox: 36.94 (0.39)
segm: 33.23 (0.33)
Config Download

ONNX

ONNX Classification

ONNX Model Compression algorithm Dataset Accuracy (drop) %
DenseNet-121 PTQ ImageNet 60.16 (0.8)
GoogleNet PTQ ImageNet 66.36 (0.3)
MobileNet V2 PTQ ImageNet 71.38 (0.49)
ResNet-50 PTQ ImageNet 74.63 (0.21)
ShuffleNet PTQ ImageNet 47.25 (0.18)
SqueezeNet V1.0 PTQ ImageNet 54.3 (0.54)
VGG‑16 PTQ ImageNet 72.02 (0.0)

ONNX Object Detection

ONNX Model Compression algorithm Dataset mAP (drop) %
SSD1200 PTQ COCO2017 20.17 (0.17)
Tiny-YOLOv2 PTQ VOC12 29.03 (0.23)