Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaudi2 support opt-66b(DS mode) #279

Closed
wants to merge 7 commits into from
Closed

Conversation

ZhaiFeiyue
Copy link
Collaborator

What does this PR do?

current text-generation only support Bloom-176b, but not support opt-66b, since opt-66b can not fit into 1 gaudi2 device(96GB).
with this PR opt-66B could run with 2 Gaudi2 cards

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@ZhaiFeiyue
Copy link
Collaborator Author

ZhaiFeiyue commented Jun 25, 2023

@regisss opt-66B cannot run on 1 Gaudi2 and also cannot run with DS 8x, since in here model weights will be transferred to device before weights sharding in deepspeed.init_inference(), I have tried to change opt path to bloom path, but there are confliction, such as

  • deepspeed will hijack nn.Embedding which is also be done in modeling_opt.py
  • in _attn, the device of caucal_mask is meta which lead to a runtime mismatch of tensors

so I change the device from args.device to cpu, means load weight to host and then do weights sharding in deepspeed.init_inference(), which works well when running opt-66b on 2x
But see here, if device is cpu and when running with 8x, means loading weights from disk to memory 8 times, waste lots of host memory, right?

@ZhaiFeiyue ZhaiFeiyue changed the title Gaudi2 support opt-66b Gaudi2 support opt-66b(DS mode) Jun 25, 2023
@regisss
Copy link
Collaborator

regisss commented Jun 27, 2023

Thanks for this PR @ZhaiFeiyue, I'll review it by the end of this week!

@regisss
Copy link
Collaborator

regisss commented Jun 29, 2023

@ZhaiFeiyue So I investigated this a bit, and with a few changes to DeepSpeed I got it to work:

  • replace torch.half by torch.bfloat16 here and there
  • replacing the forward in OPTEmbedding with more or less the same as we have here

Do you think we can push these changes to Habana's DeepSpeed fork?
Otherwise we can do some monkey patching in a similar way to what we do to override Transformers modeling. WDYT?

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss thanks for your investigation 😄, with the changes above you can run text-generation with opt-66b ds 8x in the way like Bloom right,? since bloom is special handled in run_generation.py (I guess you have another PR that makes changes in run_generation.py).
or base on this PR with your changes, you can run opt-66b ds 8x in your side?

@regisss
Copy link
Collaborator

regisss commented Jun 30, 2023

@regisss thanks for your investigation 😄, with the changes above you can run text-generation with opt-66b ds 8x in the way like Bloom right,? since bloom is special handled in run_generation.py (I guess you have another PR that makes changes in run_generation.py). or base on this PR with your changes, you can run opt-66b ds 8x in your side?

Yes, it follows the same path as BLOOM in the script. I'm going to push a new commit so that you can test it.
But the question is: should these changes be directly in Habana's DeepSpeed fork or just in Optimum Habana?

@ZhaiFeiyue
Copy link
Collaborator Author

cool @regisss, for your question I prefer we do these in optimum-habana side, since Habana DS is a fork version and should not involve too much model special changes.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 30, 2023

The documentation is not available anymore as the PR was closed or merged.

@ZhaiFeiyue
Copy link
Collaborator Author

ZhaiFeiyue commented Jul 2, 2023

@regisss very clean changes 👍
but there is an error in my side
image

any ideas or I missed something?

@regisss
Copy link
Collaborator

regisss commented Jul 2, 2023

@ZhaiFeiyue Yeah I have the same error, I need to see where it comes from exactly. When doing the changes directly in DeepSpeed it was working.
I've not got time to investigate this much yet, I'll let you know when it is fixed 🙂

@regisss
Copy link
Collaborator

regisss commented Jul 3, 2023

@ZhaiFeiyue The model is now loaded correctly but results are weird:

Input/outputs:
----------------------------------------------------------------------
input 1: ('DeepSpeed is a machine learning framework',)
output 1: ('DeepSpeed is a machine learning framework',)
----------------------------------------------------------------------

It's like it is not doing anything.

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss yes, same in my side

@ZhaiFeiyue
Copy link
Collaborator Author

outputs are always 50272
image

@ZhaiFeiyue
Copy link
Collaborator Author

weights seems that not loaded correctly
the red part is dummy weights
image

@regisss
Copy link
Collaborator

regisss commented Jul 3, 2023

Weird 🤔
I'll be off for a few hours, feel free to keep investigating @ZhaiFeiyue. Otherwise I'll get back to it later.

@regisss
Copy link
Collaborator

regisss commented Jul 3, 2023

@ZhaiFeiyue There must be something wrong in the injection policy. Maybe because of the way we write the JSON checkpoint here:

data = {"type": "BLOOM", "checkpoints": checkpoint_files, "version": 1.0}

I also see that auto-injection will be merged at some point (see here), so if it's planned for v1.11.0 maybe it's better to just wait for it and keep your initial change?

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss agree with you, I will open a new PR and we could keep your changes in this PR

@ZhaiFeiyue
Copy link
Collaborator Author

new PR is #285

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss I finally got time to debug OPT-66b now 😄, the weights is not loaded correctly because the name mismatch, your injection works well. I add new changes here
image

if with opt-125m and 66b the weights name starting from model. but 13b starting from decoder.

see here the prefix with level=0 will be stripped, which lead to the name mismatch here

I have tested 125m 13b, will test 66b later when resource available.

@regisss
Copy link
Collaborator

regisss commented Jul 13, 2023

@ZhaiFeiyue Nice! Let me know if OPT-66b works 🙂

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss bad news 😢 66b does not work, but comment this line it works need do more investigation.

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss I have checked all the weights' name of bloom models(from 560m to 176b), all of them are same.
but for OPT, they are different as I mentioned before

  • OPT-125m OPT-66b prefix=model. and name starting with model.
  • OPT-13b, OPT-30b prefix = '' and name starting with decoder.

transformers from_pretrained() could handle both the above cases from here but in DS, the prefix are always removed I think from here, because ckpt_type should be pp and level should be 0, so when running OPT-125m and 66b the name is mismatch(in the checkpints bin file it starting with model. but in DS the name is starting decoder. since model. is removed) and weight cannot be loaded.

base on the above analysis, what I think is is there any possible that we create a offline weights converter tool that to remove prefix for 125m and 66b?

correct me if some wrong.

@regisss
Copy link
Collaborator

regisss commented Jul 18, 2023

@ZhaiFeiyue I just pushed a rebase.
I'm not a big fan of creating a tool just for that. Let me see with the Transformers team if there is anything they can do on their side.

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss that's great if could fix in Transformers team.

@ZhaiFeiyue
Copy link
Collaborator Author

@regisss let's close this PR, because the changes for OPT should be in Habana DS like official DS

@ZhaiFeiyue ZhaiFeiyue closed this Aug 9, 2023
@regisss
Copy link
Collaborator

regisss commented Aug 9, 2023

@regisss let's close this PR, because the changes for OPT should be in Habana DS like official DS

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants