Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto convert moe param groups #5354

Merged
merged 9 commits into from
Apr 5, 2024

Conversation

jeffra
Copy link
Collaborator

@jeffra jeffra commented Apr 2, 2024

When using frameworks like HF Accelerate with MoE models in HF there's an issue when DeepSpeed is creating the optimizer where we have no way to automatically create the compatible MoE param groups. This PR detects if no client optimizer is set and model_parameters are passed to DeepSpeed that they are either MoE compatible or makes them MoE compatible automatically.

This was never an issue previously since (1) MoE hasn't really been tested outside MDS and (2) MDS manually converts the weight-decay param groups into being MoE compatible before deepspeed.initialize.

The error that is triggered if the param groups are not MoE compatible is triggered here:

assert any(
[self.is_moe_group(group) for group in self.optimizer.param_groups]
), "The model has moe layers, but None of the param groups are marked as MoE. Create a param group with 'moe' key set to True before creating optimizer"

Tagging @tohtana and @ykim362 to help review

deepspeed/moe/utils.py Show resolved Hide resolved
deepspeed/moe/utils.py Show resolved Hide resolved
@loadams loadams enabled auto-merge April 5, 2024 16:25
@loadams loadams added this pull request to the merge queue Apr 5, 2024
Merged via the queue into microsoft:master with commit 42a8eaa Apr 5, 2024
12 checks passed
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
When using frameworks like HF Accelerate with MoE models in HF there's
an issue when DeepSpeed is creating the optimizer where we have no way
to automatically create the compatible MoE param groups. This PR detects
if no client optimizer is set and model_parameters are passed to
DeepSpeed that they are either MoE compatible or makes them MoE
compatible automatically.

This was never an issue previously since (1) MoE hasn't really been
tested outside MDS and (2) MDS manually converts the weight-decay param
groups into being MoE compatible before deepspeed.initialize.

The error that is triggered if the param groups are not MoE compatible
is triggered here:
https://github.com/microsoft/DeepSpeed/blob/cc897ecf15fdac5437fa4a2743154dc6c1749da4/deepspeed/runtime/zero/stage_1_and_2.py#L610-L612

Tagging @tohtana and @ykim362 to help review

---------

Co-authored-by: Jeff Rasley <[email protected]>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
When using frameworks like HF Accelerate with MoE models in HF there's
an issue when DeepSpeed is creating the optimizer where we have no way
to automatically create the compatible MoE param groups. This PR detects
if no client optimizer is set and model_parameters are passed to
DeepSpeed that they are either MoE compatible or makes them MoE
compatible automatically.

This was never an issue previously since (1) MoE hasn't really been
tested outside MDS and (2) MDS manually converts the weight-decay param
groups into being MoE compatible before deepspeed.initialize.

The error that is triggered if the param groups are not MoE compatible
is triggered here:
https://github.com/microsoft/DeepSpeed/blob/cc897ecf15fdac5437fa4a2743154dc6c1749da4/deepspeed/runtime/zero/stage_1_and_2.py#L610-L612

Tagging @tohtana and @ykim362 to help review

---------

Co-authored-by: Jeff Rasley <[email protected]>
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
When using frameworks like HF Accelerate with MoE models in HF there's
an issue when DeepSpeed is creating the optimizer where we have no way
to automatically create the compatible MoE param groups. This PR detects
if no client optimizer is set and model_parameters are passed to
DeepSpeed that they are either MoE compatible or makes them MoE
compatible automatically.

This was never an issue previously since (1) MoE hasn't really been
tested outside MDS and (2) MDS manually converts the weight-decay param
groups into being MoE compatible before deepspeed.initialize.

The error that is triggered if the param groups are not MoE compatible
is triggered here:
https://github.com/microsoft/DeepSpeed/blob/cc897ecf15fdac5437fa4a2743154dc6c1749da4/deepspeed/runtime/zero/stage_1_and_2.py#L610-L612

Tagging @tohtana and @ykim362 to help review

---------

Co-authored-by: Jeff Rasley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants