Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the deepspeed init_inference reserves device memory as model size #1335

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yeonsily
Copy link
Collaborator

For text-generation, set 'keep_module_on_host' to True to save device memory.

For quantization run, https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_FP8.html#enabling-and-running-inc-in-pytorch-models, says "If DeepSpeed is used, INC should be called after deepspeed.init_inference. "
But when deepspeed.init_inference() is called, it reserves the device memory as much as bf16 model size because it's initialized as bf16 model dtype first. This reserves extra device memory.

This change is fix for that.

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@yeonsily yeonsily requested review from regisss and removed request for regisss September 16, 2024 22:07
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants