Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove HMP from optimum-habana #349

Merged

Conversation

jwieczorekhabana
Copy link
Contributor

HMP is deprecated in favor of PyTorch autocast

  • removed hmp usage
  • removed setting autocast env variables through GaudiConfig
  • updated tests
  • updated docs
  • updated README.md files

@regisss
Copy link
Collaborator

regisss commented Aug 18, 2023

@jwieczorekhabana Thanks for opening this PR! I'll review it in the beginning of next week and I'll try to accelerate on merging the autocast PRs.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwieczorekhabana I left several comments, the main point being that #308 enables users to specify their own op lists in Gaudi config files so this should be documented, tested, etc (which should basically consist in changing a few variable names).

For OPT, I need to check if we need to disable autocast for this specific line.

Note that we should not merge this PR before we manage to make Wav2Vec2 work with autocast. It is being investigated but no fix has been merged yet.

Comment on lines -120 to -129
elif self.gaudi_config.use_torch_autocast:
# Open temporary files to write mixed-precision ops
with tempfile.NamedTemporaryFile() as hmp_bf16_file:
with tempfile.NamedTemporaryFile() as hmp_fp32_file:
self.gaudi_config.write_bf16_fp32_ops_to_text_files(
hmp_bf16_file.name,
hmp_fp32_file.name,
)
os.environ["LOWER_LIST"] = str(hmp_bf16_file)
os.environ["FP32_LIST"] = str(hmp_fp32_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this block as it enables to specify custom bf16/fp32 ops for autocast

optimum/habana/transformers/modeling_utils.py Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we still need this for GPT2 with autocast, see #308 that I just merged.
Can you rebase your branch and replace the "HMP" mention in the docstring please?

Comment on lines 81 to 98
def write_bf16_fp32_ops_to_text_files(
self,
path_to_bf16_file: Path,
path_to_fp32_file: Path,
):
for path, ops in zip(
[Path(path_to_bf16_file), Path(path_to_fp32_file)], [self.hmp_bf16_ops, self.hmp_fp32_ops]
):
with path.open("w") as text_file:
# writelines does not add new lines after each element so "\n" is inserted
text_file.writelines(op + "\n" for op in ops)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this method to be able to specify autocast custom op lists in a Gaudi config

Comment on lines 62 to 87
def test_write_bf16_fp32_ops_to_text_files(self):
gaudi_config = GaudiConfig()

with tempfile.NamedTemporaryFile() as bf16_file:
with tempfile.NamedTemporaryFile() as fp32_file:
gaudi_config.write_bf16_fp32_ops_to_text_files(
bf16_file.name,
fp32_file.name,
)

self.assertTrue(
filecmp.cmp(
bf16_file.name,
BF16_OPS_REFERENCE_FILE,
shallow=False,
)
)
self.assertTrue(
filecmp.cmp(
fp32_file.name,
FP32_OPS_REFERENCE_FILE,
shallow=False,
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this test as write_bf16_fp32_ops_to_text_files will still be used to specify custom op lists.
I propose to instantiate a Gaudi config such as:

gaudi_config = GaudiConfig(
    autocast_bf16_ops=[
        "add",
        "addmm",
        "bmm",
        "div",
        "dropout",
        "gelu",
        "iadd",
        "linear",
        "layer_norm",
        "matmul",
        "mm",
        "rsub",
        "softmax",
        "truediv",
    ],
    autocast_fp32_ops=[
        "embedding",
        "nll_loss",
        "log_softmax",
    ],
)

Comment on lines -44 to -45
- `hmp_bf16_ops` enables to specify the Torch operations that should be computed in *bf16*. You can find more information about casting rules [here](https://docs.habana.ai/en/latest/PyTorch/PyTorch_Mixed_Precision/PT_Mixed_Precision.html#basic-design-rules).
- `hmp_fp32_ops` enables to specify the Torch operations that should be computed in *fp32*. You can find more information about casting rules [here](https://docs.habana.ai/en/latest/PyTorch/PyTorch_Mixed_Precision/PT_Mixed_Precision.html#basic-design-rules).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mention:

  • use_torch_autocast but saying that --bf16 should be favored as use_torch_autocast is used to define a good pre-defined config
  • autocast_bf16_ops and autocast_fp32_ops as Add support for autocast custom ops in GaudiTrainer #308 enables users to specify cutom op lists but saying that the default should work for most models

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed by email, regarding autocast_bf16_ops and autocast_fp32_ops, I'm fine with saying that the env variable way should be favored. But they should still be documented.

@@ -57,44 +57,16 @@ To not take them into account in the computation of the throughput at the end of
## Mixed-Precision Training

Mixed-precision training enables to compute some operations using lighter data types to accelerate training.
Habana Mixed Precision (HMP) proposes to mix *fp32* and *bf16* operations.
Optimum-Habana enables mixed precision training in a similar fasion as HuggingFace transofrmers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optimum-Habana enables mixed precision training in a similar fasion as HuggingFace transofrmers.
Optimum Habana enables mixed precision training in a similar fashion as 🤗 Transformers.

Comment on lines -62 to -66
<Tip warning={true}>

Please refer to the [list of supported PyTorch operators](https://docs.habana.ai/en/latest/PyTorch/Pytorch_Operators/Pytorch_Operators.html) beforehand to make sure the ones you are interested in are compatible with *bf16*.

</Tip>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those operators are incompatible with autocast. HMP and autocast operate on different software levels. Please see: https://docs.habana.ai/en/latest/PyTorch/PyTorch_Mixed_Precision/Autocast.html#override-options

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, we don't have to keep the same operators. Maybe it will just be easier to refer to GPT2's Gaudi config.

Comment on lines -69 to -93
Then, you can specify which operators to compute in *bf16* with `"hmp_bf16_ops"` and which operators to compute in *fp32* with `"hmp_fp32_ops"`.
If these operators are not specified, their default values are set to be the ones written in the [Gaudi configuration file of BERT](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json), which is a good starting point for applying HMP:
```
"hmp_bf16_ops": [
"add",
"addmm",
"bmm",
"div",
"dropout",
"gelu",
"iadd",
"linear",
"layer_norm",
"matmul",
"mm",
"rsub",
"softmax",
"truediv"
],
"hmp_fp32_ops": [
"embedding",
"nll_loss",
"log_softmax"
]
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still keep a part of this to show how to specify custom op lists. We can add a link to the GPT2 Gaudi config when it is updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't users provide custom lists in a similar way to other training demos outside of HuggingFace? We can keep those in GaudiConfig to make sure they are optimized for specific model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO users should be able to do both because those already used to Optimum Habana probably have Gaudi configs with custom op lists, so switching to Autocast will be easy and they won't be confused.

@regisss regisss added the run-test Run CI for PRs from external contributors label Sep 15, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR! I left a few comments.

Could you also run the following to make the code style check pass please?

pip install black ruff
make style

docs/source/package_reference/gaudi_config.mdx Outdated Show resolved Hide resolved
docs/source/package_reference/gaudi_config.mdx Outdated Show resolved Hide resolved
docs/source/usage_guides/accelerate_training.mdx Outdated Show resolved Hide resolved
tests/test_gaudi_configuration.py Outdated Show resolved Hide resolved
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Sep 19, 2023
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Let's wait to know what happens with Wav2Vec2 training before merging anything cc @libinta

docs/source/usage_guides/accelerate_training.mdx Outdated Show resolved Hide resolved
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Sep 19, 2023
@regisss regisss mentioned this pull request Nov 10, 2023
3 tasks
@regisss
Copy link
Collaborator

regisss commented Nov 15, 2023

@jwieczorekhabana will the Gaudi config for Wav2Vec2 be the following?

{
  "use_fused_adam": true,
  "use_fused_clip_norm": true,
  "use_torch_autocast": true
}

Or will there be any specific bf16 ops?

@jwieczorekhabana
Copy link
Contributor Author

@jwieczorekhabana will the Gaudi config for Wav2Vec2 be the following?

{
  "use_fused_adam": true,
  "use_fused_clip_norm": true,
  "use_torch_autocast": true
}

Or will there be any specific bf16 ops?

That'll be it. There is a hanging PR on HuggingFace https://huggingface.co/Habana/wav2vec2/discussions/2/files that can be merged after 1.13 release.
Similarly for gpt2 there is a closed PR that can be reopened https://huggingface.co/Habana/gpt2/discussions/4/files

Release 1.13 addresses issues both of those topologies had.

@regisss
Copy link
Collaborator

regisss commented Nov 16, 2023

Cool, I'll merge them when 1.13 is released then 👍

@regisss
Copy link
Collaborator

regisss commented Nov 19, 2023

@jwieczorekhabana Can you rebase this branch on main? There are merge conflicts in tests/test_diffusers.py so GitHub won't let me merge the PR when 1.13 is released.

jwieczorekhabana and others added 4 commits November 22, 2023 09:56
HMP is deprecated in favor of PyTorch autocast
- removed hmp usage
- removed setting autocast env variables through GaudiConfig
- updated tests
- updated docs
- updated README.md files
- restored test_gaudi_config imports
@jwieczorekhabana
Copy link
Contributor Author

@jwieczorekhabana Can you rebase this branch on main? There are merge conflicts in tests/test_diffusers.py so GitHub won't let me merge the PR when 1.13 is released.

@regisss Done

@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Nov 22, 2023
@regisss
Copy link
Collaborator

regisss commented Nov 22, 2023

@jwieczorekhabana It seems there are still some merge conflicts in tests/test_diffusers.py

Change-Id: Ibbd216b9b6b5c9104b7dce3021f231eda3748704
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Nov 24, 2023
@regisss regisss added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Nov 24, 2023
@regisss
Copy link
Collaborator

regisss commented Nov 24, 2023

@jwieczorekhabana I pushed a few commits to solve the CI that was not passing

@regisss regisss merged commit 2129f91 into huggingface:main Nov 24, 2023
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-test Run CI for PRs from external contributors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants