-
Reference Deployment on Neural Engine
6.1 Dense Reference
6.2 Sparse Reference
Intel Extension for Transformers is a powerful toolkit with multiple model optimization techniques for Natural Language Processing Models, including quantization, pruning, distillation, auto distillation and orchestrate. Meanwhile Intel Extension for Transformers provides Transformers-accelerated Neural Engine, an optimized backend for NLP models to demonstrate the deployment.
Model | Task | Dataset | QuantizationAwareTraining | No Trainer quantization |
---|---|---|---|---|
textattack/bert-base-uncased-MRPC | text-classification | MRPC | ✔ | |
echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid | text-classification | SST-2 | ✔ |
Model | Task | Dataset | PostTrainingStatic |
---|---|---|---|
distilbert-base-uncased-distilled-squad | question-answering | SQuAD | ✔ |
bert-large-uncased-whole-word-maskinuned-squad | question-answering | SQuAD | ✔ |
Model | Task | Dataset | PostTrainingStatic |
---|---|---|---|
bert-base-cased-finetuned-mrpc | text-classification | MRPC | ✔ |
xlnet-base-cased | text-classification | MRPC | ✔ |
distilgpt2 | language-modeling(CLM) | wikitext | ✔ |
distilbert-base-cased | language-modeling(MLM) | wikitext | ✔ |
Rocketknight1/bert-base-uncased-finetuned-swag | multiple-choice | swag | ✔ |
dslim/bert-base-NER | token-classification | conll2003 | ✔ |
Model Name | Datatype | Optimization Method | Modelsize (MB) | Inference Result | |||
---|---|---|---|---|---|---|---|
Accuracy(F1) | Latency(ms) | GFLOPS** | Speedup(compared with BERT Base) | ||||
BERT Base | fp32 | None | 415.47 | 88.58 | 56.56 | 35.3 | 1x |
LA-MiniLM | fp32 | Drop and restore base MiniLMv2 | 115.04 | 89.28 | 16.99 | 4.76 | 3.33x |
LA-MiniLM(269, 253, 252, 202, 104, 34)* | fp32 | Evolution search (best config) | 115.04 | 87.76 | 11.44 | 2.49 | 4.94x |
QuaLA-MiniLM | int8 | Quantization base LA-MiniLM | 84.85 | 88.85 | 7.84 | 4.76 | 7.21x |
QuaLA-MiniLM(315,251,242,159,142,33)* | int8 | Evolution search (best config) | 84.86 | 87.68 | 6.41 | 2.55 | 8.82x |
Note: * length config apply to Length Adaptive model
Note: ** the multiplication and addition operation amount when model inference (GFLOPS is obtained from torchprofile tool)
Data is tested on Intel Xeon Platinum 8280 Scalable processor. Configuration detail please refer to examples
Model | Task | Dataset | Pruning Approach | Pruning Type | Framework |
---|---|---|---|---|---|
distilbert-base-uncased-distilled-squad | question-answering | SQuAD | BasicMagnitude | Unstructured | Stock PyTorch |
bert-large-uncased | question-answering | SQuAD | Group LASSO | Structured | Stock PyTorch |
distilbert-base-uncased-finetuned-sst-2-english | text-classification | SST-2 | BasicMagnitude | Unstructured | Stock PyTorch/ Intel TensorFlow |
Student Model | Teacher Model | Task | Dataset |
---|---|---|---|
distilbert-base-uncased | bert-base-uncased-SST-2 | text-classification | SST-2 |
distilbert-base-uncased | bert-base-uncased-QNLI | text-classification | QNLI |
distilbert-base-uncased | bert-base-uncased-QQP | text-classification | QQP |
distilbert-base-uncased | bert-base-uncased-MNLI-v1 | text-classification | MNLI |
distilbert-base-uncased | bert-base-uncased-squad-v1 | question-answering | SQuAD |
TinyBERT_General_4L_312D | bert-base-uncased-MNLI-v1 | text-classification | MNLI |
distilroberta-base | roberta-large-cola-krishna2020 | text-classification | COLA |
Model | Task | Dataset | Distillation Teacher |
---|---|---|---|
google/mobilebert-uncased | language-modeling(MLM) | wikipedia | bert-large-uncased |
prajjwal1/bert-tiny | language-modeling(MLM) | wikipedia | bert-base-uncased |
Model | Task | Dataset | Distillation Teacher | Pruning Approch | Pruning Type |
---|---|---|---|---|---|
Intel/distilbert-base-uncased-sparse-90-unstructured-pruneofa | question-answering | SQuAD | distilbert-base-uncased-distilled-squad | PatternLock | Unstructured |
BasicMagnitude | Unstructured | ||||
text-classification | SST-2 | distilbert-base-uncased-finetuned-sst-2-english | PatternLock | Unstructured | |
BasicMagnitude | Unstructured |
Model | Task | Dataset | Datatype | |
---|---|---|---|---|
INT8 | BF16 | |||
bert-large-uncased-whole-word-masking-finetuned-squad | question-answering | SQuAD | ✔ | ✔ |
bhadresh-savani/distilbert-base-uncased-emotion | text-classification | emotion | ✔ | ✔ |
textattack/bert-base-uncased-MRPC | text-classification | MRPC | ✔ | ✔ |
textattack/distilbert-base-uncased-MRPC | text-classification | MRPC | ✔ | ✔ |
Intel/roberta-base-mrpc | text-classification | MRPC | ✔ | ✔ |
M-FAC/bert-mini-finetuned-mrpc | text-classification | MRPC | ✔ | ✔ |
gchhablani/bert-base-cased-finetuned-mrpc | text-classification | MRPC | ✔ | ✔ |
distilbert-base-uncased-finetuned-sst-2-english | text-classification | SST-2 | ✔ | ✔ |
philschmid/MiniLM-L6-H384-uncased-sst2 | text-classification | SST-2 | ✔ | ✔ |
moshew/bert-mini-sst2-distilled | text-classification | SST-2 | ✔ | ✔ |
Model | Task | Dataset | Datatype | |
---|---|---|---|---|
INT8 | BF16 | |||
Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa | question-answering | SQuAD | ✔ | WIP ⭐ |
Intel/bert-mini-sst2-distilled-sparse-90-1X4-block | text-classification | SST-2 | ✔ | WIP ⭐ |
Model | Task | Dataset | Early-Exit Type |
---|---|---|---|
bert-base-uncased | text-classification | MNLI | SWEET notebook |
philschmid/tiny-bert-sst2-distilled textattack/roberta-base-SST-2 |
text-classification | SST-2 | TangoBERT notebook |