Skip to content

Commit

Permalink
Optimized inference of Cohere model on HPU
Browse files Browse the repository at this point in the history
Signed-off-by: Ye, Xinyu <[email protected]>
  • Loading branch information
XinyuYe-Intel committed Sep 12, 2024
1 parent 0027e32 commit 613845b
Show file tree
Hide file tree
Showing 7 changed files with 501 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ The following model architectures, tasks and device distributions have been vali
| Persimmon | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Qwen2 | <div style="text-align:left"><li>Single card</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Gemma | :heavy_check_mark: | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Cohere | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| T5 / Flan T5 | :heavy_check_mark: | :heavy_check_mark: | <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| BART | | <div style="text-align:left"><li>Single card</li></div> | <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| ViT | :heavy_check_mark: | :heavy_check_mark: | <li>[image classification](https://github.com/huggingface/optimum-habana/tree/main/examples/image-classification)</li> |
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ In the tables below, ✅ means single-card, multi-card and DeepSpeed have all be
| Gemma || <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Qwen2 | <div style="text-align:left"><li>Single card</li></div> | <div style="text-align:left"><li>Single card</li></div> | <li>[language modeling](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling)</li><li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Persimmon | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| Cohere | | <div style="text-align:left"><li>Single card</li></div> | <li>[text generation](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)</li> |
| T5 / Flan T5 ||| <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| BART | | <div style="text-align:left"><li>Single card</li></div> | <li>[summarization](https://github.com/huggingface/optimum-habana/tree/main/examples/summarization)</li><li>[translation](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)</li><li>[question answering](https://github.com/huggingface/optimum-habana/tree/main/examples/question-answering#fine-tuning-t5-on-squad20)</li> |
| ViT ||| <li>[image classification](https://github.com/huggingface/optimum-habana/tree/main/examples/image-classification)</li> |
Expand Down
1 change: 1 addition & 0 deletions optimum/habana/transformers/generation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@
"stablelm",
"mamba",
"deci",
"cohere",
]


Expand Down
10 changes: 10 additions & 0 deletions optimum/habana/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@
GaudiCLIPVisionTransformer,
GaudiCodeGenAttention,
GaudiCodeGenForCausalLM,
GaudiCohereDecoderLayer,
GaudiCohereForCausalLM,
GaudiFalconAttention,
GaudiFalconDecoderLayer,
GaudiFalconForCausalLM,
Expand Down Expand Up @@ -130,6 +132,8 @@
gaudi_codegen_block_forward,
gaudi_codegen_model_forward,
gaudi_conv1d_forward,
gaudi_cohere_attention_forward,
gaudi_cohere_model_forward,
gaudi_DetrConvModel_forward,
gaudi_esm_for_protein_folding_forward,
gaudi_esmfolding_trunk_forward,
Expand Down Expand Up @@ -561,3 +565,9 @@ def adapt_transformers_to_gaudi():

transformers.AutoConfig.register("deci", DeciLMConfig)
transformers.AutoModelForCausalLM.register(DeciLMConfig, DeciLMForCausalLM)

# Optimization for cohere on Gaudi
transformers.models.cohere.modeling_cohere.CohereDecoderLayer = GaudiCohereDecoderLayer
transformers.models.cohere.modeling_cohere.CohereForCausalLM = GaudiCohereForCausalLM
transformers.models.cohere.modeling_cohere.CohereModel.forward = gaudi_cohere_model_forward
transformers.models.cohere.modeling_cohere.CohereAttention.forward = gaudi_cohere_attention_forward
6 changes: 6 additions & 0 deletions optimum/habana/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@
gaudi_codegen_block_forward,
gaudi_codegen_model_forward,
)
from .cohere import (
GaudiCohereDecoderLayer,
GaudiCohereForCausalLM,
gaudi_cohere_attention_forward,
gaudi_cohere_model_forward,
)
from .decilm import (
DeciLMConfig,
DeciLMForCausalLM,
Expand Down
6 changes: 6 additions & 0 deletions optimum/habana/transformers/models/cohere/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from .modeling_cohere import (
GaudiCohereDecoderLayer,
GaudiCohereForCausalLM,
gaudi_cohere_attention_forward,
gaudi_cohere_model_forward,
)
Loading

0 comments on commit 613845b

Please sign in to comment.