Skip to content

Latest commit

 

History

History
1509 lines (1486 loc) · 43.4 KB

validated_model_list.md

File metadata and controls

1509 lines (1486 loc) · 43.4 KB

Validated Models

Validated MLPerf Models

Model Framework Support Example
ResNet50 V1.5 TensorFlow Yes Link
PyTorch Yes Link
DLRM PyTorch Yes Link
BERT large TensorFlow Yes Link
PyTorch Yes Link
SSD ResNet34 TensorFlow Yes Link
PyTorch Yes Link
RNN-T PyTorch Yes Link
3D-UNet TensorFlow Yes Link
PyTorch Yes Link

Validated Quantization Examples

Performance results test on ​​07/26/2022 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 10 instances and batch size 1.

Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow models with Intel TensorFlow 2.9.1

Model Accuracy Performance
Throughput(samples/sec)
Example
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
BERT large SQuAD 92.43 92.99 -0.60% 25.37 12.55 2.02x pb
DenseNet121 73.57% 72.89% 0.93% 368.05 328.84 1.12x pb
DenseNet161 76.24% 76.29% -0.07% 219.08 179.2 1.22x pb
DenseNet169 74.40% 74.65% -0.33% 295.26 260.27 1.13x pb
Faster R-CNN Inception ResNet V2 37.98% 38.33% -0.91% 3.97 2.34 1.70x pb
Faster R-CNN Inception ResNet V2 37.84% 38.33% -1.28% 4 2.32 1.73x SavedModel
Faster R-CNN ResNet101 30.28% 30.39% -0.36% 70.32 20.19 3.48x pb
Faster R-CNN ResNet101 30.37% 30.39% -0.07% 70.3 17.1 4.11x SavedModel
Faster R-CNN ResNet50 26.57% 26.59% -0.08% 83.32 24.38 3.42x pb
Inception ResNet V2 80.28% 80.40% -0.15% 287.05 136.78 2.10x pb
Inception V1 70.48% 69.74% 1.06% 2208.41 977.76 2.26x pb
Inception V2 74.36% 73.97% 0.53% 1847.6 828.03 2.23x pb
Inception V3 76.71% 76.75% -0.05% 1036.18 373.61 2.77x pb
Inception V4 80.20% 80.27% -0.09% 592.46 206.96 2.86x pb
Mask R-CNN Inception V2 28.53% 28.73% -0.70% 132.07 51 2.59x pb
Mask R-CNN Inception V2 28.53% 28.73% -0.70% 132.41 50.94 2.60x ckpt
MobileNet V1 71.79% 70.96% 1.17% 3603.94 1304.58 2.76x pb
MobileNet V2 71.89% 71.76% 0.18% 2433.87 1446.1 1.68x pb
ResNet101 77.50% 76.45% 1.37% 874.26 356.84 2.45x pb
ResNet50 Fashion 78.06% 78.12% -0.08% 3776.14 2160.52 1.75x pb
ResNet50 V1.0 74.11% 74.27% -0.22% 1511.74 459.43 3.29x pb
ResNet50 V1.5 76.22% 76.46% -0.31% 1355.03 423.41 3.20x pb
ResNet V2 101 72.67% 71.87% 1.11% 436.34 323.15 1.35x pb
ResNet V2 152 73.03% 72.37% 0.91% 311.93 222.83 1.40x pb
ResNet V2 50 70.33% 69.64% 0.99% 766.83 574.76 1.33x pb
SSD MobileNet V1 22.97% 23.13% -0.69% 959.72 586.21 1.64x pb
SSD MobileNet V1 22.99% 23.13% -0.61% 953.4 412.06 2.31x ckpt
SSD ResNet34 21.69% 22.09% -1.81% 44.53 11.86 3.75x pb
SSD ResNet50 V1 37.86% 38.00% -0.37% 69.09 25.93 2.66x pb
SSD ResNet50 V1 37.81% 38.00% -0.50% 69.02 21.06 3.28x ckpt
VGG16 72.66% 70.89% 2.50% 660.05 177.23 3.72x pb
VGG19 72.72% 71.01% 2.41% 560.39 147.27 3.81x pb
Wide & Deep 77.62% 77.67% -0.07% 23329.18 20930.18 1.11x pb

PyTorch models with Torch 1.12.0+cpu in PTQ mode

Model Accuracy Performance
Throughput (samples/sec)
Example
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
ALBERT base MRPC 88.85% 88.50% 0.40% 34.46 26.88 1.28x eager
Barthez MRPC 83.92% 83.81% 0.14% 161.06 89.61 1.80x eager
BERT base COLA 58.80% 58.84% -0.07% 262.88 125.63 2.09x fx
BERT base MRPC 89.90% 90.69% -0.88% 244.27 125.28 1.95x fx
BERT base RTE 69.31% 69.68% -0.52% 259.21 125.72 2.06x fx
BERT base SST2 91.06% 91.86% -0.87% 262.73 125.69 2.09x fx
BERT base STSB 89.10% 89.75% -0.72% 254.36 125.9 2.02x fx
BERT large COLA 64.12% 62.57% 2.48% 89.36 36.47 2.45x fx
BERT large MRPC 89.50% 90.38% -0.97% 88.92 36.55 2.43x fx
BERT large QNLI 90.90% 91.82% -1.00% 90.39 36.63 2.47x fx
CamemBERT base MRPC 86.70% 86.82% -0.14% 236.6 121.81 1.94x fx
Deberta MRPC 90.88% 90.91% -0.04% 149.76 84.72 1.77x eager
DistilBERT base MRPC 88.23% 89.16% -1.05% 426.4 246.13 1.73x eager
FlauBERT MRPC 79.87% 80.19% -0.40% 675.82 437.72 1.54x eager
Inception V3 69.43% 69.52% -0.13% 490.32 209.87 2.34x eager
Longformer MRPC 91.01% 91.46% -0.49% 20.36 16.65 1.22x eager
mBart WNLI 56.34% 56.34% 0.00% 66.23 30.86 2.15x eager
lvwerra/pegasus-samsum 42.39 42.67 -0.67% 3.86 1.14 3.38x eager
PeleeNet 71.64% 72.10% -0.64% 511.56 387.9 1.32x eager
ResNet18 69.57% 69.76% -0.27% 823.22 386.93 2.13x eager
ResNet18 69.57% 69.76% -0.28% 816.8 385.23 2.12x fx
ResNet50 75.98% 76.15% -0.21% 515.14 204 2.53x eager
ResNeXt101_32x8d 79.08% 79.31% -0.29% 210.39 74.87 2.81x eager
RNNT 92.48 92.55 -0.08% 74.17 20.38 3.64x eager
Roberta Base MRPC 88.25% 88.18% 0.08% 245.05 123.53 1.98x eager
Se_ResNeXt50_32x4d 78.98% 79.08% -0.13% 370.11 172.45 2.15x eager
SqueezeBERT MRPC 86.87% 87.65% -0.89% 241.25 206.03 1.17x eager
Transfo-xl MRPC 81.97% 81.20% 0.94% 11.2 8.31 1.35x eager
xlm-roberta-base_MRPC 88.03% 88.62% -0.67% 140.58 122.29 1.15x eager
YOLOv3 24.60% 24.54% 0.21% 110.54 39.46 2.80x eager

PyTorch models with Torch 1.12.0+cpu in QAT mode

Model Accuracy Performance
Throughput (samples/sec)
Example
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
ResNet18 69.84% 69.76% 0.11% 805.76 389.14 2.07x eager
ResNet18 69.74% 69.76% -0.03% 822.99 391.82 2.10x fx
BERT base MRPC QAT 89.70% 89.50% 0.22% 173.83 107.22 1.62x fx
ResNet50 76.05% 76.15% -0.13% 500.54 195.5 2.56x eager

PyTorch models with Torch 1.12.0+cpu IPEX

Model Accuracy Performance
Throughput (samples/sec)
1s 10ins 4c/ins bs=8
Performance
Throughput (samples/sec)
1s 10ins 4c/ins bs=1
Example
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32] INT8 FP32 INT8/FP32
SSD ResNet34 19.99% 20.00% -0.06% 11.19 8.72 1.28x 11.63 9.82 1.18x ipex

PyTorch models with Torch 1.11.0+cpu IPEX

Model Accuracy Performance
Throughput(samples/sec)
1s 10ins 4c/ins bs=64
Performance
Throughput(samples/sec)
1s 10ins 4c/ins bs=1
Example
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32] INT8 FP32 INT8/FP32
ResNet18 69.48% 69.76% -0.40% 3968.05 1182.36 3.36x 2995.8 1140.63 2.63x ipex
ResNeXt101_32x16d_wsl 84.26% 84.17% 0.11% 195.72 51.28 3.82x 178.01 59.12 3.01x ipex
ResNet50 76.07% 76.15% -0.10% 1731.8 474.24 3.65x 1347.64 518.39 2.60x ipex

ONNX Models with ONNX Runtime 1.11.0

Model Accuracy Performance
Throughput(samples/sec)
Example
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
AlexNet 54.74% 54.79% -0.09% 1498.07 649.99 2.30x qdq
BERT base MRPC DYNAMIC 85.54% 86.03% -0.57% 381.45 156.05 2.44x qlinearops
BERT base MRPC STATIC 85.29% 86.03% -0.86% 766.09 316.6 2.42x qlinearops
BERT SQuAD 80.44 80.67 -0.29% 116.88 64.59 1.81x qlinearops
BERT SQuAD 80.44 80.67 -0.29% 116.93 64.64 1.81x qdq
BiDAF 65.92% 66.08% -0.24% 1468.58 1406.21 1.04x qlinearops
CaffeNet 56.26% 56.30% -0.07% 2750.7 812.73 3.38x qdq
DistilBERT base MRPC 84.56% 84.56% 0.00% 1654.95 595.32 2.78x qlinearops
EfficientNet 77.58% 77.70% -0.15% 2066.02 1096.86 1.88x qlinearops
FCN 64.66% 64.98% -0.49% 15.13 7.2 2.10x qlinearops
GoogleNet 67.67% 67.79% -0.18% 1174.98 807.68 1.45x qdq
Inception V1 67.23% 67.24% -0.01% 1181.71 831.01 1.42x qdq
Mobile bert MRPC 86.03% 86.27% -0.28% 774.96 678.66 1.14x qlinearops
MobileBERT SQuAD MLPerf 89.84 90.03 -0.20% 104.51 94.88 1.10x qlinearops
MobileNet V2 65.47% 66.89% -2.12% 5172.04 3312.76 1.56x qlinearops
MobileNet V3 MLPerf 75.59% 75.74% -0.20% 4168.8 2146.59 1.94x qlinearops
ResNet50 v1.5 MLPerf 76.13% 76.46% -0.43% 1154.73 554.69 2.08x qlinearops
ResNet50 V1.5 72.28% 72.29% -0.01% 1156.05 555.72 2.08x qlinearops
ResNet50 V1.5 (ONNX Model Zoo) 74.76% 74.99% -0.31% 1347.89 588.84 2.29x qlinearops
ResNet50 V1.5 (ONNX Model Zoo) 74.75% 74.99% -0.32% 840.87 588.77 1.43x qdq
Roberta Base MRPC 90.44% 89.95% 0.54% 811.74 312.93 2.59x qlinearops
Tiny YOLOv3 12.08% 12.43% -2.82% 801.46 653.42 1.23x qlinearops
VGG16 66.60% 66.69% -0.13% 312.98 128.7 2.43x qlinearops
VGG16 (ONNX Model Zoo) 72.28% 72.40% -0.17% 450.47 130.74 3.45x qlinearops
YOLOv3 26.88% 28.74% -6.47% 157.58 66.62 2.37x qlinearops
ZFNet 55.89% 55.96% -0.13% 658.93 359.42 1.83x qdq

MXNet models with MXNet 1.7.0

Model Accuracy Performance
Throughput(samples/sec)
INT8 FP32 Acc Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
Inception V3 77.80% 77.65% 0.20% 922.38 277.59 3.32x
MobileNet V1 71.60% 72.23% -0.86% 6614.69 2560.42 2.58x
MobileNet V3 MLPerf 70.80% 70.87% -0.10% 5230.58 2024.85 2.58x
ResNet v1 152 78.28% 78.54% -0.33% 578.27 156.38 3.70x
ResNet50 V1.0 75.91% 76.33% -0.55% 1571.33 429.53 3.66x
SqueezeNet 56.80% 56.97% -0.28% 4712.15 1323.68 3.56x
SSD MobileNet V1 74.94% 75.54% -0.79% 768.59 191.55 4.01x

Validated Pruning Examples

Tasks Framework Model FP32 Baseline Gradient Sensitivity with 20% Sparsity +ONNX Dynamic Quantization on Pruned Model
Accuracy% Drop Perf Gain (sample/s) Accuracy% Drop Perf Gain (sample/s)
SST-2 PyTorch BERT base accuracy = 92.32 accuracy = 91.97 -0.38 1.30x accuracy = 92.20 -0.13 1.86x
QQP PyTorch BERT base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [89.97, 86.54] [-1.24, -1.71] 1.32x [accuracy, f1] = [89.75, 86.60] [-1.48, -1.65] 1.81x
Tasks Framework Model FP32 Baseline Pattern Lock on 70% Unstructured Sparsity Pattern Lock on 50% 1:2 Structured Sparsity
Accuracy% Drop Accuracy% Drop
MNLI PyTorch BERT base [m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] [-2.51, -1.80] [m, mm] = [83.20, 84.11] [-1.62, -0.80]
SST-2 PyTorch BERT base accuracy = 92.32 accuracy = 91.51 -0.88 accuracy = 92.20 -0.13
QQP PyTorch BERT base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] [-0.68, -1.12] [accuracy, f1] = [90.92, 87.78] [-0.20, -0.31]
QNLI PyTorch BERT base accuracy = 91.54 accuracy = 90.39 -1.26 accuracy = 90.87 -0.73
QnA PyTorch BERT base [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] [-2.61, -1.54] [em, f1] = [78.03, 86.50] [-1.65, -0.69]
Framework Model FP32 Baseline Compression Dataset Accuracy% (Drop)
PyTorch ResNet18 69.76 30% Sparsity on Magnitude ImageNet 69.47(-0.42)
PyTorch ResNet18 69.76 30% Sparsity on Gradient Sensitivity ImageNet 68.85(-1.30)
PyTorch ResNet50 76.13 30% Sparsity on Magnitude ImageNet 76.11(-0.03)
PyTorch ResNet50 76.13 30% Sparsity on Magnitude and Post Training Quantization ImageNet 76.01(-0.16)
PyTorch ResNet50 76.13 30% Sparsity on Magnitude and Quantization Aware Training ImageNet 75.90(-0.30)

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Accuracy)
Teacher
(Accuracy)
Student With Distillation
(Accuracy Improvement)
ResNet example ImageNet ResNet18
(0.6739)
ResNet50
(0.7399)
0.6845
(0.0106)
BlendCNN example MRPC BlendCNN
(0.7034)
BERT-Base
(0.8382)
0.7034
(0)
BiLSTM example SST-2 BiLSTM
(0.7913)
RoBERTa-Base
(0.9404)
0.8085
(0.0172)