-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove HMP from optimum-habana #349
Changes from all commits
b8b9645
a4a8ebf
a3d4011
cc84d33
676a348
23d7bb1
55aa0cd
690201e
d817217
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,44 +57,16 @@ To not take them into account in the computation of the throughput at the end of | |
## Mixed-Precision Training | ||
|
||
Mixed-precision training enables to compute some operations using lighter data types to accelerate training. | ||
Habana Mixed Precision (HMP) proposes to mix *fp32* and *bf16* operations. | ||
Optimum Habana enables mixed precision training in a similar fashion as 🤗 Transformers: | ||
- argument `--bf16` enables usage of PyTorch autocast | ||
- argument `--half_precision_backend [hpu_amp, cpu_amp]` is used to specify a device on which mixed precision operations should be performed | ||
|
||
<Tip warning={true}> | ||
|
||
Please refer to the [list of supported PyTorch operators](https://docs.habana.ai/en/latest/PyTorch/Pytorch_Operators/Pytorch_Operators.html) beforehand to make sure the ones you are interested in are compatible with *bf16*. | ||
|
||
</Tip> | ||
Comment on lines
-62
to
-66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would keep this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But those operators are incompatible with autocast. HMP and autocast operate on different software levels. Please see: https://docs.habana.ai/en/latest/PyTorch/PyTorch_Mixed_Precision/Autocast.html#override-options There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, we don't have to keep the same operators. Maybe it will just be easier to refer to GPT2's Gaudi config. |
||
|
||
To apply HMP, you must set `"use_habana_mixed_precision"` to `true` in the Gaudi configuration file. | ||
Then, you can specify which operators to compute in *bf16* with `"hmp_bf16_ops"` and which operators to compute in *fp32* with `"hmp_fp32_ops"`. | ||
If these operators are not specified, their default values are set to be the ones written in the [Gaudi configuration file of BERT](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json), which is a good starting point for applying HMP: | ||
``` | ||
"hmp_bf16_ops": [ | ||
"add", | ||
"addmm", | ||
"bmm", | ||
"div", | ||
"dropout", | ||
"gelu", | ||
"iadd", | ||
"linear", | ||
"layer_norm", | ||
"matmul", | ||
"mm", | ||
"rsub", | ||
"softmax", | ||
"truediv" | ||
], | ||
"hmp_fp32_ops": [ | ||
"embedding", | ||
"nll_loss", | ||
"log_softmax" | ||
] | ||
``` | ||
Comment on lines
-69
to
-93
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would still keep a part of this to show how to specify custom op lists. We can add a link to the GPT2 Gaudi config when it is updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But shouldn't users provide custom lists in a similar way to other training demos outside of HuggingFace? We can keep those in GaudiConfig to make sure they are optimized for specific model. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO users should be able to do both because those already used to Optimum Habana probably have Gaudi configs with custom op lists, so switching to Autocast will be easy and they won't be confused. |
||
|
||
<Tip> | ||
<Tip warning={true}> | ||
regisss marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Torch Autocast can also be used as a backend for mixed-precision training. You need to add the argument `--bf16` to enable it. | ||
Please refer to the [advanced autocast usage on Gaudi](https://docs.habana.ai/en/latest/PyTorch/PyTorch_Mixed_Precision/Autocast.html) for more informations regarding: | ||
- default autocast operations | ||
- default autocast operations override | ||
|
||
</Tip> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should mention:
use_torch_autocast
but saying that--bf16
should be favored asuse_torch_autocast
is used to define a good pre-defined configautocast_bf16_ops
andautocast_fp32_ops
as Add support for autocast custom ops inGaudiTrainer
#308 enables users to specify cutom op lists but saying that the default should work for most modelsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed by email, regarding
autocast_bf16_ops
andautocast_fp32_ops
, I'm fine with saying that the env variable way should be favored. But they should still be documented.