Gaudi2 support opt-66b(DS mode) #279

ZhaiFeiyue · 2023-06-25T07:33:04Z

What does this PR do?

current text-generation only support Bloom-176b, but not support opt-66b, since opt-66b can not fit into 1 gaudi2 device(96GB).
with this PR opt-66B could run with 2 Gaudi2 cards

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

ZhaiFeiyue · 2023-06-25T07:43:25Z

@regisss opt-66B cannot run on 1 Gaudi2 and also cannot run with DS 8x, since in here model weights will be transferred to device before weights sharding in deepspeed.init_inference(), I have tried to change opt path to bloom path, but there are confliction, such as

deepspeed will hijack nn.Embedding which is also be done in modeling_opt.py
in _attn, the device of caucal_mask is meta which lead to a runtime mismatch of tensors

so I change the device from args.device to cpu, means load weight to host and then do weights sharding in deepspeed.init_inference(), which works well when running opt-66b on 2x
But see here, if device is cpu and when running with 8x, means loading weights from disk to memory 8 times, waste lots of host memory, right?

regisss · 2023-06-27T07:11:08Z

Thanks for this PR @ZhaiFeiyue, I'll review it by the end of this week!

regisss · 2023-06-29T23:53:54Z

@ZhaiFeiyue So I investigated this a bit, and with a few changes to DeepSpeed I got it to work:

replace torch.half by torch.bfloat16 here and there
replacing the forward in OPTEmbedding with more or less the same as we have here

Do you think we can push these changes to Habana's DeepSpeed fork?
Otherwise we can do some monkey patching in a similar way to what we do to override Transformers modeling. WDYT?

ZhaiFeiyue · 2023-06-30T00:33:59Z

@regisss thanks for your investigation 😄, with the changes above you can run text-generation with opt-66b ds 8x in the way like Bloom right,? since bloom is special handled in run_generation.py (I guess you have another PR that makes changes in run_generation.py).
or base on this PR with your changes, you can run opt-66b ds 8x in your side?

regisss · 2023-06-30T07:51:38Z

@regisss thanks for your investigation 😄, with the changes above you can run text-generation with opt-66b ds 8x in the way like Bloom right,? since bloom is special handled in run_generation.py (I guess you have another PR that makes changes in run_generation.py). or base on this PR with your changes, you can run opt-66b ds 8x in your side?

Yes, it follows the same path as BLOOM in the script. I'm going to push a new commit so that you can test it.
But the question is: should these changes be directly in Habana's DeepSpeed fork or just in Optimum Habana?

ZhaiFeiyue · 2023-06-30T07:56:49Z

cool @regisss, for your question I prefer we do these in optimum-habana side, since Habana DS is a fork version and should not involve too much model special changes.

HuggingFaceDocBuilderDev · 2023-06-30T17:57:50Z

The documentation is not available anymore as the PR was closed or merged.

ZhaiFeiyue · 2023-07-02T15:53:20Z

@regisss very clean changes 👍
but there is an error in my side

any ideas or I missed something?

regisss · 2023-07-02T17:43:27Z

@ZhaiFeiyue Yeah I have the same error, I need to see where it comes from exactly. When doing the changes directly in DeepSpeed it was working.
I've not got time to investigate this much yet, I'll let you know when it is fixed 🙂

regisss · 2023-07-03T09:54:47Z

@ZhaiFeiyue The model is now loaded correctly but results are weird:

Input/outputs:
----------------------------------------------------------------------
input 1: ('DeepSpeed is a machine learning framework',)
output 1: ('DeepSpeed is a machine learning framework',)
----------------------------------------------------------------------

It's like it is not doing anything.

ZhaiFeiyue · 2023-07-03T09:56:38Z

@regisss yes, same in my side

ZhaiFeiyue · 2023-07-03T10:03:22Z

outputs are always 50272

ZhaiFeiyue · 2023-07-03T10:15:42Z

weights seems that not loaded correctly
the red part is dummy weights

regisss · 2023-07-03T10:34:12Z

Weird 🤔
I'll be off for a few hours, feel free to keep investigating @ZhaiFeiyue. Otherwise I'll get back to it later.

regisss · 2023-07-03T23:16:45Z

@ZhaiFeiyue There must be something wrong in the injection policy. Maybe because of the way we write the JSON checkpoint here:

optimum-habana/examples/text-generation/checkpoint_utils.py

Line 59 in b1f0bd7

data = {"type": "BLOOM", "checkpoints": checkpoint_files, "version": 1.0}

I also see that auto-injection will be merged at some point (see here), so if it's planned for v1.11.0 maybe it's better to just wait for it and keep your initial change?

ZhaiFeiyue · 2023-07-04T01:27:16Z

@regisss agree with you, I will open a new PR and we could keep your changes in this PR

ZhaiFeiyue · 2023-07-04T02:55:46Z

new PR is #285

ZhaiFeiyue · 2023-07-13T08:01:11Z

@regisss I finally got time to debug OPT-66b now 😄, the weights is not loaded correctly because the name mismatch, your injection works well. I add new changes here

if with opt-125m and 66b the weights name starting from model. but 13b starting from decoder.

see here the prefix with level=0 will be stripped, which lead to the name mismatch here

I have tested 125m 13b, will test 66b later when resource available.

regisss · 2023-07-13T11:09:57Z

@ZhaiFeiyue Nice! Let me know if OPT-66b works 🙂

ZhaiFeiyue · 2023-07-14T07:19:33Z

@regisss bad news 😢 66b does not work, but comment this line it works need do more investigation.

ZhaiFeiyue · 2023-07-18T07:02:29Z

@regisss I have checked all the weights' name of bloom models(from 560m to 176b), all of them are same.
but for OPT, they are different as I mentioned before

OPT-125m OPT-66b prefix=model. and name starting with model.
OPT-13b, OPT-30b prefix = '' and name starting with decoder.

transformers from_pretrained() could handle both the above cases from here but in DS, the prefix are always removed I think from here, because ckpt_type should be pp and level should be 0, so when running OPT-125m and 66b the name is mismatch(in the checkpints bin file it starting with model. but in DS the name is starting decoder. since model. is removed) and weight cannot be loaded.

base on the above analysis, what I think is is there any possible that we create a offline weights converter tool that to remove prefix for 125m and 66b?

correct me if some wrong.

regisss · 2023-07-18T09:16:43Z

@ZhaiFeiyue I just pushed a rebase.
I'm not a big fan of creating a tool just for that. Let me see with the Transformers team if there is anything they can do on their side.

ZhaiFeiyue · 2023-07-18T09:19:52Z

@regisss that's great if could fix in Transformers team.

ZhaiFeiyue · 2023-08-09T06:50:25Z

@regisss let's close this PR, because the changes for OPT should be in Habana DS like official DS

regisss · 2023-08-09T07:20:58Z

@regisss let's close this PR, because the changes for OPT should be in Habana DS like official DS

Sounds good!

ZhaiFeiyue requested a review from regisss June 25, 2023 07:33

ZhaiFeiyue force-pushed the load_weights_to_cpu branch from 2834e8e to 4520bf1 Compare June 25, 2023 07:34

gaudi2 support opt-66b

4520bf1

ZhaiFeiyue changed the title ~~Gaudi2 support opt-66b~~ Gaudi2 support opt-66b(DS mode) Jun 25, 2023

Adapt DeepSpeed module injection for OPT

dd095c9

regisss added 3 commits June 30, 2023 18:00

Merge branch 'main' into load_weights_to_cpu

b1fe47d

Fix super

a5f1cc7

Fix

ccea6f8

Override DS OPT layers

0032f86

ZhaiFeiyue force-pushed the load_weights_to_cpu branch from 8930ebb to 0032f86 Compare July 18, 2023 02:10

Merge branch 'main' into load_weights_to_cpu

7ab5eea

ZhaiFeiyue closed this Aug 9, 2023

HolyFalafel mentioned this pull request Jul 21, 2024

Added AutoGPTQ readme and test #1147

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gaudi2 support opt-66b(DS mode) #279

Gaudi2 support opt-66b(DS mode) #279

ZhaiFeiyue commented Jun 25, 2023

ZhaiFeiyue commented Jun 25, 2023 •

edited

Loading

regisss commented Jun 27, 2023 •

edited

Loading

regisss commented Jun 29, 2023

ZhaiFeiyue commented Jun 30, 2023

regisss commented Jun 30, 2023

ZhaiFeiyue commented Jun 30, 2023

HuggingFaceDocBuilderDev commented Jun 30, 2023 •

edited

Loading

ZhaiFeiyue commented Jul 2, 2023 •

edited

Loading

regisss commented Jul 2, 2023

regisss commented Jul 3, 2023

ZhaiFeiyue commented Jul 3, 2023

ZhaiFeiyue commented Jul 3, 2023

ZhaiFeiyue commented Jul 3, 2023

regisss commented Jul 3, 2023

regisss commented Jul 3, 2023

ZhaiFeiyue commented Jul 4, 2023

ZhaiFeiyue commented Jul 4, 2023

ZhaiFeiyue commented Jul 13, 2023

regisss commented Jul 13, 2023

ZhaiFeiyue commented Jul 14, 2023

ZhaiFeiyue commented Jul 18, 2023

regisss commented Jul 18, 2023

ZhaiFeiyue commented Jul 18, 2023

ZhaiFeiyue commented Aug 9, 2023

regisss commented Aug 9, 2023

Gaudi2 support opt-66b(DS mode) #279

Gaudi2 support opt-66b(DS mode) #279

Conversation

ZhaiFeiyue commented Jun 25, 2023

What does this PR do?

Before submitting

ZhaiFeiyue commented Jun 25, 2023 • edited Loading

regisss commented Jun 27, 2023 • edited Loading

regisss commented Jun 29, 2023

ZhaiFeiyue commented Jun 30, 2023

regisss commented Jun 30, 2023

ZhaiFeiyue commented Jun 30, 2023

HuggingFaceDocBuilderDev commented Jun 30, 2023 • edited Loading

ZhaiFeiyue commented Jul 2, 2023 • edited Loading

regisss commented Jul 2, 2023

regisss commented Jul 3, 2023

ZhaiFeiyue commented Jul 3, 2023

ZhaiFeiyue commented Jul 3, 2023

ZhaiFeiyue commented Jul 3, 2023

regisss commented Jul 3, 2023

regisss commented Jul 3, 2023

ZhaiFeiyue commented Jul 4, 2023

ZhaiFeiyue commented Jul 4, 2023

ZhaiFeiyue commented Jul 13, 2023

regisss commented Jul 13, 2023

ZhaiFeiyue commented Jul 14, 2023

ZhaiFeiyue commented Jul 18, 2023

regisss commented Jul 18, 2023

ZhaiFeiyue commented Jul 18, 2023

ZhaiFeiyue commented Aug 9, 2023

regisss commented Aug 9, 2023

ZhaiFeiyue commented Jun 25, 2023 •

edited

Loading

regisss commented Jun 27, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 30, 2023 •

edited

Loading

ZhaiFeiyue commented Jul 2, 2023 •

edited

Loading