Ready-to-use Compressed LLMs can be found on OpenVINO Hugging Face page. Each model card includes NNCF parameters that were used to compress the model.
INT8 Post-Training Quantization (PTQ) results for public Vision, NLP and GenAI models can be found on OpenVino Performance Benchmarks page. PTQ results for ONNX models are available in the ONNX section below.
Quantization-Aware Training (QAT) results for PyTorch and TensorFlow public models can be found below.
Model | Compression algorithm | Dataset | Accuracy (drop) % | Configuration | Checkpoint |
---|---|---|---|---|---|
GoogLeNet | - | ImageNet | 69.77 | Config | - |
GoogLeNet | • Filter pruning: 40%, geometric median criterion | ImageNet | 69.47 (0.30) | Config | Download |
Inception V3 | - | ImageNet | 77.33 | Config | - |
Inception V3 | • QAT: INT8 | ImageNet | 77.45 (-0.12) | Config | Download |
Inception V3 | • QAT: INT8 • Sparsity: 61% (RB) |
ImageNet | 76.36 (0.97) | Config | Download |
MobileNet V2 | - | ImageNet | 71.87 | Config | - |
MobileNet V2 | • QAT: INT8 | ImageNet | 71.07 (0.80) | Config | Download |
MobileNet V2 | • QAT: INT8 (per-tensor only) | ImageNet | 71.24 (0.63) | Config | Download |
MobileNet V2 | • QAT: Mixed, 58.88% INT8 / 41.12% INT4 | ImageNet | 70.95 (0.92) | Config | Download |
MobileNet V2 | • QAT: INT8 • Sparsity: 52% (RB) |
ImageNet | 71.09 (0.78) | Config | Download |
MobileNet V3 (Small) | - | ImageNet | 67.66 | Config | - |
MobileNet V3 (Small) | • QAT: INT8 | ImageNet | 66.98 (0.68) | Config | Download |
ResNet-18 | • Filter pruning: 40%, magnitude criterion | ImageNet | 69.27 (0.49) | Config | Download |
ResNet-18 | • Filter pruning: 40%, geometric median criterion | ImageNet | 69.31 (0.45) | Config | Download |
ResNet-18 | • Accuracy-aware compressed training • Filter pruning: 60%, geometric median criterion |
ImageNet | 69.2 (-0.6) | Config | - |
ResNet-34 | - | ImageNet | 73.30 | Config | - |
ResNet-34 | • Filter pruning: 50%, geometric median criterion • Knowledge distillation |
ImageNet | 73.11 (0.19) | Config | Download |
ResNet-50 | - | ImageNet | 76.15 | Config | - |
ResNet-50 | • QAT: INT8 | ImageNet | 76.46 (-0.31) | Config | Download |
ResNet-50 | • QAT: INT8 (per-tensor only) | ImageNet | 76.39 (-0.24) | Config | Download |
ResNet-50 | • QAT: Mixed, 43.12% INT8 / 56.88% INT4 | ImageNet | 76.05 (0.10) | Config | Download |
ResNet-50 | • QAT: INT8 • Sparsity: 61% (RB) |
ImageNet | 75.42 (0.73) | Config | Download |
ResNet-50 | • QAT: INT8 • Sparsity: 50% (RB) |
ImageNet | 75.50 (0.65) | Config | Download |
ResNet-50 | • Filter pruning: 40%, geometric median criterion | ImageNet | 75.57 (0.58) | Config | Download |
ResNet-50 | • Accuracy-aware compressed training • Filter pruning: 52.5%, geometric median criterion |
ImageNet | 75.23 (0.93) | Config | - |
SqueezeNet V1.1 | - | ImageNet | 58.19 | Config | - |
SqueezeNet V1.1 | • QAT: INT8 | ImageNet | 58.22 (-0.03) | Config | Download |
SqueezeNet V1.1 | • QAT: INT8 (per-tensor only) | ImageNet | 58.11 (0.08) | Config | Download |
SqueezeNet V1.1 | • QAT: Mixed, 52.83% INT8 / 47.17% INT4 | ImageNet | 57.57 (0.62) | Config | Download |
Model | Compression algorithm | Dataset | mAP (drop) % | Configuration | Checkpoint |
---|---|---|---|---|---|
SSD300‑MobileNet | - | VOC12+07 train, VOC07 eval | 62.23 | Config | Download |
SSD300‑MobileNet | • QAT: INT8 • Sparsity: 70% (Magnitude) |
VOC12+07 train, VOC07 eval | 62.95 (-0.72) | Config | Download |
SSD300‑VGG‑BN | - | VOC12+07 train, VOC07 eval | 78.28 | Config | Download |
SSD300‑VGG‑BN | • QAT: INT8 | VOC12+07 train, VOC07 eval | 77.81 (0.47) | Config | Download |
SSD300‑VGG‑BN | • QAT: INT8 • Sparsity: 70% (Magnitude) |
VOC12+07 train, VOC07 eval | 77.66 (0.62) | Config | Download |
SSD300‑VGG‑BN | • Filter pruning: 40%, geometric median criterion | VOC12+07 train, VOC07 eval | 78.35 (-0.07) | Config | Download |
SSD512-VGG‑BN | - | VOC12+07 train, VOC07 eval | 80.26 | Config | Download |
SSD512-VGG‑BN | • QAT: INT8 | VOC12+07 train, VOC07 eval | 80.04 (0.22) | Config | Download |
SSD512-VGG‑BN | • QAT: INT8 • Sparsity: 70% (Magnitude) |
VOC12+07 train, VOC07 eval | 79.68 (0.58) | Config | Download |
Model | Compression algorithm | Dataset | mIoU (drop) % | Configuration | Checkpoint |
---|---|---|---|---|---|
ICNet | - | CamVid | 67.89 | Config | Download |
ICNet | • QAT: INT8 | CamVid | 67.89 (0.00) | Config | Download |
ICNet | • QAT: INT8 • Sparsity: 60% (Magnitude) |
CamVid | 67.16 (0.73) | Config | Download |
UNet | - | CamVid | 71.95 | Config | Download |
UNet | • QAT: INT8 | CamVid | 71.89 (0.06) | Config | Download |
UNet | • QAT: INT8 • Sparsity: 60% (Magnitude) |
CamVid | 72.46 (-0.51) | Config | Download |
UNet | - | Mapillary | 56.24 | Config | Download |
UNet | • QAT: INT8 | Mapillary | 56.09 (0.15) | Config | Download |
UNet | • QAT: INT8 • Sparsity: 60% (Magnitude) |
Mapillary | 55.69 (0.55) | Config | Download |
UNet | • Filter pruning: 25%, geometric median criterion | Mapillary | 55.64 (0.60) | Config | Download |
Model | Compression algorithm | Dataset | Accuracy (drop) % | Configuration | Checkpoint |
---|---|---|---|---|---|
Inception V3 | - | ImageNet | 77.91 | Config | - |
Inception V3 | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) | ImageNet | 78.39 (-0.48) | Config | Download |
Inception V3 | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) • Sparsity: 61% (RB) |
ImageNet | 77.52 (0.39) | Config | Download |
Inception V3 | • Sparsity: 54% (Magnitude) | ImageNet | 77.86 (0.05) | Config | Download |
MobileNet V2 | - | ImageNet | 71.85 | Config | - |
MobileNet V2 | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) | ImageNet | 71.63 (0.22) | Config | Download |
MobileNet V2 | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) • Sparsity: 52% (RB) |
ImageNet | 70.94 (0.91) | Config | Download |
MobileNet V2 | • Sparsity: 50% (RB) | ImageNet | 71.34 (0.51) | Config | Download |
MobileNet V2 (TensorFlow Hub MobileNet V2) | • Sparsity: 35% (Magnitude) | ImageNet | 71.87 (-0.02) | Config | Download |
MobileNet V3 (Large) | - | ImageNet | 75.80 | Config | - |
MobileNet V3 (Large) | • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) | ImageNet | 75.04 (0.76) | Config | Download |
MobileNet V3 (Large) | • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) • Sparsity: 42% (RB) |
ImageNet | 75.24 (0.56) | Config | Download |
MobileNet V3 (Small) | - | ImageNet | 68.38 | Config | - |
MobileNet V3 (Small) | • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) | ImageNet | 67.79 (0.59) | Config | Download |
MobileNet V3 (Small) | • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) • Sparsity: 42% (Magnitude) |
ImageNet | 67.44 (0.94) | Config | Download |
ResNet-50 | - | ImageNet | 75.05 | Config | - |
ResNet-50 | • QAT: INT8 | ImageNet | 74.99 (0.06) | Config | Download |
ResNet-50 | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) • Sparsity: 65% (RB) |
ImageNet | 74.36 (0.69) | Config | Download |
ResNet-50 | • Sparsity: 80% (RB) | ImageNet | 74.38 (0.67) | Config | Download |
ResNet-50 | • Filter pruning: 40%, geometric median criterion | ImageNet | 74.96 (0.09) | Config | Download |
ResNet-50 | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) • Filter pruning: 40%, geometric median criterion |
ImageNet | 75.09 (-0.04) | Config | Download |
ResNet50 | • Accuracy-aware compressed training • Sparsity: 65% (Magnitude) |
ImageNet | 74.37 (0.67) | Config | - |
Model | Compression algorithm | Dataset | mAP (drop) % | Configuration | Checkpoint |
---|---|---|---|---|---|
RetinaNet | - | COCO 2017 | 33.43 | Config | Download |
RetinaNet | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) | COCO 2017 | 33.12 (0.31) | Config | Download |
RetinaNet | • Sparsity: 50% (Magnitude) | COCO 2017 | 33.10 (0.33) | Config | Download |
RetinaNet | • Filter pruning: 40% | COCO 2017 | 32.72 (0.71) | Config | Download |
RetinaNet | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) • Filter pruning: 40% |
COCO 2017 | 32.67 (0.76) | Config | Download |
YOLO v4 | - | COCO 2017 | 47.07 | Config | Download |
YOLO v4 | • QAT: INT8 (per-channel symmetric for weights, per-tensor asymmetric half-range for activations) | COCO 2017 | 46.20 (0.87) | Config | Download |
YOLO v4 | • Sparsity: 50% (Magnitude) | COCO 2017 | 46.49 (0.58) | Config | Download |
Model | Compression algorithm | Dataset | mAP (drop) % | Configuration | Checkpoint |
---|---|---|---|---|---|
Mask‑R‑CNN | - | COCO 2017 | bbox: 37.33 segm: 33.56 |
Config | Download |
Mask‑R‑CNN | • QAT: INT8 (per-tensor symmetric for weights, per-tensor asymmetric half-range for activations) | COCO 2017 | bbox: 37.19 (0.14) segm: 33.54 (0.02) |
Config | Download |
Mask‑R‑CNN | • Sparsity: 50% (Magnitude) | COCO 2017 | bbox: 36.94 (0.39) segm: 33.23 (0.33) |
Config | Download |
ONNX Model | Compression algorithm | Dataset | Accuracy (drop) % |
---|---|---|---|
DenseNet-121 | PTQ | ImageNet | 60.16 (0.8) |
GoogleNet | PTQ | ImageNet | 66.36 (0.3) |
MobileNet V2 | PTQ | ImageNet | 71.38 (0.49) |
ResNet-50 | PTQ | ImageNet | 74.63 (0.21) |
ShuffleNet | PTQ | ImageNet | 47.25 (0.18) |
SqueezeNet V1.0 | PTQ | ImageNet | 54.3 (0.54) |
VGG‑16 | PTQ | ImageNet | 72.02 (0.0) |
ONNX Model | Compression algorithm | Dataset | mAP (drop) % |
---|---|---|---|
SSD1200 | PTQ | COCO2017 | 20.17 (0.17) |
Tiny-YOLOv2 | PTQ | VOC12 | 29.03 (0.23) |