Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MPT fp8 #1256

Merged
merged 25 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b9587ca
Enable MPT fp8 support
atakaha Aug 2, 2024
51ddf87
Fix cache position issue in mixtral (#1272)
tthakkal Aug 20, 2024
60ae80b
Add temporary directories to test_trainer.py
regisss Aug 20, 2024
18249d4
Fix memory regression for modeling llama (#1271)
libinta Aug 20, 2024
0d3e0f4
Fix profiling step with device finish execution for text-generation (…
libinta Aug 22, 2024
18efdc1
Revert mark_step in mixtral model from PR #1260 (#1273)
yeonsily Aug 22, 2024
7891a86
Remove huggingface_hub install that is no longer needed in the kubern…
dmsuehir Aug 23, 2024
5566721
Add missing condtion check in tensor creation in greedy search (#1288)
yeonsily Aug 23, 2024
a909e6b
Fix BERT FSDP test (#1281)
regisss Aug 23, 2024
5c567d3
Llava: Added flash_attention_recompute arg to provide an option to en…
tthakkal Aug 23, 2024
886ab67
Get seq len fix propagate (#1291)
ssarkar2 Aug 23, 2024
9a51d34
Update last stable release in README
regisss Aug 25, 2024
08e30aa
Update minimal required versions in examples
regisss Aug 28, 2024
2e6a0da
Update FusedSDPA calling method as Gaudi documentation (#1285)
yeonsily Aug 29, 2024
e019bce
Mixtral fp8 tests (#1269)
imangohari1 Aug 29, 2024
b05a1a5
Switch failed code quality check comment to `workflow_run` (#1297)
regisss Aug 29, 2024
7c409ad
Potential fix for the failed code quality check comment workflow (#1299)
regisss Aug 30, 2024
5092e4c
Potential fix 2 for failed code quality check comment workflow
regisss Aug 30, 2024
9a29cc2
Potential fix 3 for failed code quality check workflow
regisss Aug 30, 2024
46c2d59
Other potentiel fix
regisss Aug 30, 2024
f9d46eb
New potential fix
regisss Aug 30, 2024
e7d62b3
Enabling Text to Video Diffusion Model Generation (#1109)
pi314ever Aug 30, 2024
fe8ae86
Prevent Graph break in Llama when using flash attention (#1301)
pramodkumar-habanalabs Aug 30, 2024
dc7d72e
Enable MPT fp8 support
atakaha Aug 2, 2024
46446b8
Merge branch 'huggingface:main' into mpt_fp8
tthakkal Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions optimum/habana/transformers/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@
GaudiMixtralDecoderLayer,
GaudiMixtralForCausalLM,
GaudiMixtralModel,
GaudiMptAttention,
GaudiMptBlock,
GaudiMptForCausalLM,
GaudiMptModel,
GaudiOPTForCausalLM,
Expand Down Expand Up @@ -152,8 +154,6 @@
gaudi_mistral_rmsnorm_forward,
gaudi_mixtral_block_sparse_moe_forward,
gaudi_mixtral_rmsnorm_forward,
gaudi_mpt_attention_forward,
gaudi_mpt_block_forward,
gaudi_opt_attention_forward,
gaudi_opt_decoder_forward,
gaudi_opt_decoder_layer_forward,
Expand Down Expand Up @@ -420,8 +420,8 @@ def adapt_transformers_to_gaudi():
# Optimization for mpt on Gaudi
transformers.models.mpt.modeling_mpt.MptForCausalLM = GaudiMptForCausalLM
transformers.models.mpt.modeling_mpt.MptModel = GaudiMptModel
transformers.models.mpt.modeling_mpt.MptAttention.forward = gaudi_mpt_attention_forward
transformers.models.mpt.modeling_mpt.MptBlock.forward = gaudi_mpt_block_forward
transformers.models.mpt.modeling_mpt.MptAttention = GaudiMptAttention
transformers.models.mpt.modeling_mpt.MptBlock = GaudiMptBlock

# Optimization for mistral on Gaudi
transformers.models.mistral.modeling_mistral.MistralForCausalLM = GaudiMistralForCausalLM
Expand Down
4 changes: 2 additions & 2 deletions optimum/habana/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,10 +138,10 @@
gaudi_invert_attention_mask,
)
from .mpt import (
GaudiMptAttention,
GaudiMptBlock,
GaudiMptForCausalLM,
GaudiMptModel,
gaudi_mpt_attention_forward,
gaudi_mpt_block_forward,
)
from .opt import (
GaudiOPTForCausalLM,
Expand Down
4 changes: 2 additions & 2 deletions optimum/habana/transformers/models/mpt/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from .modeling_mpt import (
GaudiMptAttention,
GaudiMptBlock,
GaudiMptForCausalLM,
GaudiMptModel,
gaudi_mpt_attention_forward,
gaudi_mpt_block_forward,
)
Loading
Loading