Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Distributors to make it acutally do its jobs #68

Closed

Conversation

tsunghsienlee
Copy link
Contributor

@tsunghsienlee tsunghsienlee commented Dec 20, 2024

Summary:
In the current Shampoo design, there are three major components: DistributedShampoo, which describes the high-level algorithm flow; Distributor, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and PreconditionerList, which contains the detailed algorithm implementation.

The Distributor manages the distribution of parameters and then sends them to DistributedShampoo which invokes the corresponding algorithms implemented by each PreconditionerList.

Since the Distributor handles parameter distribution, ideally, downstream classes (i.e., DistributedShampoo and PreconditionerList) should not need to use Distributor.distributor_selector to compress parameters for their use. However, in the current implementation, such usages appear in both DistributedShampoo and PreconditionerList.

The main changes are listed as follows:

  1. Distributor now only provides access to the local version of params and block_info to align with its responsibilities.
  2. Distributor no longer needs to store global_block_info_list, the global version of block_info; instead, it stores local_block_info_list.
  3. PreconditionerList no longer requires distributor_selector as an input argument because Distributor already performs this task.

Differential Revision: D64708506

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 20, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

@tsunghsienlee tsunghsienlee requested a review from runame December 20, 2024 18:22
tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Dec 20, 2024
…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Dec 21, 2024
…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Reviewed By: chuanhaozhuge

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

@tsunghsienlee tsunghsienlee requested a review from runame December 21, 2024 22:28
tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Dec 21, 2024
…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Reviewed By: chuanhaozhuge

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Dec 22, 2024
…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Reviewed By: chuanhaozhuge

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

Copy link
Contributor

@runame runame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now (ignoring the internal failures).

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Dec 23, 2024
…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Reviewed By: anana10c, chuanhaozhuge

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Reviewed By: anana10c, chuanhaozhuge

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64708506

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Dec 23, 2024
…ch#68)

Summary:

In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation.

The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`.

Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146).

The main changes are listed as follows:
1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities.
2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`.
3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task.

Reviewed By: anana10c, chuanhaozhuge

Differential Revision: D64708506
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 618d857.

@tsunghsienlee tsunghsienlee deleted the export-D64708506 branch December 24, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants